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The word protein that I propose to you ... I would wish to 
derive from proteios, because it appears to be the 
primitive or principal substance of animal nutrition that 
plants prepare for the herbivores, and which the latter 
then furnish to the carnivores. 

—J. J. Berzelius, letter to G. J. Mulder, 1838 


P roteins are the most abundant biological macromol¬ 
ecules, occurring in all cells and all parts of cells. Pro¬ 
teins also occur in great variety; thousands of different 
kinds, ranging in size from relatively small peptides to 
huge polymers with molecular weights in the millions, 
may be found in a single cell. Moreover, proteins exhibit 
enormous diversity of biological function and are the 
most important final products of the information path¬ 
ways discussed in Part III of this book. Proteins are the 
molecular instruments through which genetic informa¬ 
tion is expressed. 

Relatively simple monomeric subunits provide the 
key to the structure of the thousands of different pro¬ 
teins. All proteins, whether from the most ancient lines 
of bacteria or from the most complex forms of life, are 
constructed from the same ubiquitous set of 20 amino 


acids, covalently linked in characteristic linear sequences. 
Because each of these amino acids has a side chain with 
distinctive chemical properties, this group of 20 pre¬ 
cursor molecules may be regarded as the alphabet in 
which the language of protein structure is written. 

What is most remarkable is that cells can produce 
proteins with strikingly different properties and activi¬ 
ties by joining the same 20 amino acids in many differ¬ 
ent combinations and sequences. From these building 
blocks different organisms can make such widely diverse 
products as enzymes, hormones, antibodies, trans¬ 
porters, muscle fibers, the lens protein of the eye, feath¬ 
ers, spider webs, rhinoceros horn, milk proteins, antibi¬ 
otics, mushroom poisons, and myriad other substances 
having distinct biological activities (Fig. 3-1). Among 
these protein products, the enzymes are the most var¬ 
ied and specialized. Virtually all cellular reactions are 
catalyzed by enzymes. 

Protein structure and function are the topics of this 
and the next three chapters. We begin with a descrip¬ 
tion of the fundamental chemical properties of amino 
acids, peptides, and proteins. 


3.1 Amino Acids 

Protein Architecture—Amino Acids 

Proteins are polymers of amino acids, with each amino 
acid residue joined to its neighbor by a specific type 
of covalent bond. (The term “residue” reflects the loss 
of the elements of water when one amino acid is joined 
to another.) Proteins can be broken down (hydrolyzed) 
to their constituent amino acids by a variety of methods, 
and the earliest studies of proteins naturally focused on 
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FIGURE 3-1 Some functions of proteins, (a) The light produced by 
fireflies is the result of a reaction involving the protein luciferin and 
ATP, catalyzed by the enzyme luciferase (see Box 13-2). (b) Erythro¬ 
cytes contain large amounts of the oxygen-transporting protein he¬ 
moglobin. (c) The protein keratin, formed by all vertebrates, is the 
chief structural component of hair, scales, horn, wool, nails, and feath¬ 


ers. The black rhinoceros is nearing extinction in the wild because of 
the belief prevalent in some parts of the world that a powder derived 
from its horn has aphrodisiac properties. In reality, the chemical prop¬ 
erties of powdered rhinoceros horn are no different from those of pow¬ 
dered bovine hooves or human fingernails. 


the free amino acids derived from them. Twenty differ¬ 
ent amino acids are commonly found in proteins. The 
first to be discovered was asparagine, in 1806. The last 
of the 20 to be found, threonine, was not identified until 
1938. All the amino acids have trivial or common names, 
in some cases derived from the source from which they 
were first isolated. Asparagine was first found in as¬ 
paragus, and glutamate in wheat gluten; tyrosine was 
first isolated from cheese (its name is derived from the 
Greek tyros, “cheese”); and glycine (Greek glykos, 
“sweet”) was so named because of its sweet taste. 

Amino Acids Share Common Structural Features 

All 20 of the common amino acids are a-amino acids. 
They have a carboxyl group and an amino group bonded 
to the same carbon atom (the a carbon) (Fig. 3-2). They 
differ from each other in their side chains, or R groups, 
which vary in structure, size, and electric charge, and 
which influence the solubility of the amino acids in wa¬ 
ter. In addition to these 20 amino acids there are many 
less common ones. Some are residues modified after a 
protein has been synthesized; others are amino acids 
present in living organisms but not as constituents of 
proteins. The common amino acids of proteins have 
been assigned three-letter abbreviations and one-letter 


COO 

h 3 n— —h 

R 

FIGURE 3-2 General structure of an amino acid. This structure is 
common to all but one of the cr-amino acids. (Proline, a cyclic amino 
acid, is the exception.) The R group or side chain (red) attached to the 
a carbon (blue) is different in each amino acid. 


symbols (Table 3-1), which are used as shorthand to in¬ 
dicate the composition and sequence of amino acids 
polymerized in proteins. 

Two conventions are used to identify the carbons in 
an amino acid—a practice that can be confusing. The 
additional carbons in an R group are commonly desig¬ 
nated /3, y, 8, e, and so forth, proceeding out from the 
a carbon. For most other organic molecules, carbon 
atoms are simply numbered from one end, giving high¬ 
est priority (C-l) to the carbon with the substituent con¬ 
taining the atom of highest atomic number. Within this 
latter convention, the carboxyl carbon of an amino acid 
would be C-l and the a carbon would be C-2. In some 
cases, such as amino acids with heterocyclic R groups, 
the Greek lettering system is ambiguous and the num¬ 
bering convention is therefore used. 

e 5 7 /3 cc 

6 5 4 3 2 1 

ch 2 —ch 2 —ch 2 —ch 2 —CH—COO- 
+nh 3 +nh 3 

Lysine 

For all the common amino acids except glycine, the 
a carbon is bonded to four different groups: a carboxyl 
group, an amino group, an R group, and a hydrogen atom 
(Fig. 3-2; in glycine, the R group is another hydrogen 
atom). The a-carbon atom is thus a chiral center 
(p. 17). Because of the tetrahedral arrangement of the 
bonding orbitals around the a-carbon atom, the four dif¬ 
ferent groups can occupy two unique spatial arrange¬ 
ments, and thus amino acids have two possible 
stereoisomers. Since they are nonsuperimposable mir¬ 
ror images of each other (Fig. 3-3), the two forms rep¬ 
resent a class of stereoisomers called enantiomers (see 
Fig. 1-19). All molecules with a chiral center are also 
optically active —that is, they rotate plane-polarized 
light (see Box 1-2). 
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Special nomenclature has been developed to spec¬ 
ify the absolute configuration of the four substituents 
of asymmetric carbon atoms. The absolute configura¬ 
tions of simple sugars and amino acids are specified by 
the d, l system (Fig. 3-4), based on the absolute con¬ 
figuration of the three-carbon sugar glyceraldehyde, a 
convention proposed by Emil Fischer in 1891. (Fischer 
knew what groups surrounded the asymmetric carbon 
of glyceraldehyde but had to guess at their absolute 
configuration; his guess was later confirmed by x-ray 
diffraction analysis.) For all chiral compounds, stereo¬ 
isomers having a configuration related to that of 
L-glyceraldehyde are designated l, and stereoisomers 
related to D-glyceraldehyde are designated d. The func¬ 
tional groups of L-alanine are matched with those of l- 
glyceraldehyde by aligning those that can be intercon- 
verted by simple, one-step chemical reactions. Thus the 
carboxyl group of L-alanine occupies the same position 
about the chiral carbon as does the aldehyde group 
of L-glyceraldehyde, because an aldehyde is readily 
converted to a carboxyl group via a one-step oxidation. 
Historically, the similar l and d designations were used 
for levorotatory (rotating light to the left) and dextro¬ 
rotatory (rotating light to the right). However, not all 


COO- COO- 



(a) 

L-Alanine 

D-Alanine 


coo- 

coo- 


H 3 N—C—H 

H—C—NH, 


ch 3 

ch 3 

(b) 

L-Alanine 

D-Alanine 


(c) 


COO- 

+ I 

H 3 N—C—H 

CH 3 

l- Alanine 


coo- 
I + 

H—C—NH 

I 

ch 3 

D-Alanine 


3 


FIGURE 3-3 Stereoisomerism in ar-amino acids, (a) The two stereoiso¬ 
mers of alanine, l- and D-alanine, are nonsuperimposable mirror im¬ 
ages of each other (enantiomers), (b, c) Two different conventions for 
showing the configurations in space of stereoisomers. In perspective 
formulas (b) the solid wedge-shaped bonds project out of the plane 
of the paper, the dashed bonds behind it. In projection formulas (c) 
the horizontal bonds are assumed to project out of the plane of the 
paper, the vertical bonds behind. However, projection formulas are 
often used casually and are not always intended to portray a specific 
stereochemical configuration. 
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FIGURE 3-4 Steric relationship of the stereoisomers of alanine to 
the absolute configuration of l- and D-glyceraldehyde. In these per¬ 
spective formulas, the carbons are lined up vertically, with the chiral 
atom in the center. The carbons in these molecules are numbered be¬ 
ginning with the terminal aldehyde or carboxyl carbon (red), I to 3 
from top to bottom as shown. When presented in this way, the R group 
of the amino acid (in this case the methyl group of alanine) is always 
below the a carbon. L-Amino acids are those with the a-amino group 
on the left, and D-amino acids have the a-amino group on the right. 


L-amino acids are levorotatory, and the convention 
shown in Figure 3-4 was needed to avoid potential am¬ 
biguities about absolute configuration. By Fischer’s con¬ 
vention, l and d refer only to the absolute configura¬ 
tion of the four substituents around the chiral carbon, 
not to optical properties of the molecule. 

Another system of specifying configuration around 
a chiral center is the RS system, which is used in the 
systematic nomenclature of organic chemistry and de¬ 
scribes more precisely the configuration of molecules 
with more than one chiral center (see p. 18). 

The Amino Acid Residues in Proteins 
Are l Stereoisomers 

Nearly all biological compounds with a chiral center oc¬ 
cur naturally in only one stereoisomeric form, either d 
or l. The amino acid residues in protein molecules are 
exclusively l stereoisomers. D-Amino acid residues have 
been found only in a few, generally small peptides, in¬ 
cluding some peptides of bacterial cell walls and certain 
peptide antibiotics. 

It is remarkable that virtually all amino acid residues 
in proteins are l stereoisomers. When chiral compounds 
are formed by ordinary chemical reactions, the result is 
a racemic mixture of d and l isomers, which are diffi¬ 
cult for a chemist to distinguish and separate. But to a 
living system, d and l isomers are as different as the 
right hand and the left. The formation of stable, re¬ 
peating substructures in proteins (Chapter 4) generally 
requires that their constituent amino acids be of one 
stereochemical series. Cells are able to specifically syn¬ 
thesize the l isomers of amino acids because the active 
sites of enzymes are asymmetric, causing the reactions 
they catalyze to be stereospecific. 
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TABLE 3-1 Properties and Conventions Associated with the Common Amino Acids Found in Proteins 

Amino acid 

Abbreviation/ 

symbol 

M r 

PK i 

(-COOH) 

pK a values 

pK 2 

(~NHt) 

PKr 

(R group) 

Pi 

Hydropathy 

index* 

Occurrence in 
proteins (%) t 

Nonpolar, aliphatic 

R groups 

Glycine 

Gly G 

75 

2.34 

9.60 


5.97 

-0.4 

7.2 

Alanine 

Ala A 

89 

2.34 

9.69 


6.01 

1.8 

7.8 

Proline 

Pro P 

115 

1.99 

10.96 


6.48 

1.6 

5.2 

Valine 

Val V 

117 

2.32 

9.62 


5.97 

4.2 

6.6 

Leucine 

Leu L 

131 

2.36 

9.60 


5.98 

3.8 

9.1 

Isoleucine 

lie 1 

131 

2.36 

9.68 


6.02 

4.5 

5.3 

Methionine 

Met M 

149 

2.28 

9.21 


5.74 

1.9 

2.3 

Aromatic R groups 

Phenylalanine 

Phe F 

165 

1.83 

9.13 


5.48 

2.8 

3.9 

Tyrosine 

Tyr Y 

181 

2.20 

9.11 

10.07 

5.66 

-1.3 

3.2 

Tryptophan 

Trp W 

204 

2.38 

9.39 


5.89 

-0.9 

1.4 

Polar, uncharged 

R groups 

Serine 

Ser S 

105 

2.21 

9.15 


5.68 

-0.8 

6.8 

Threonine 

Thr T 

119 

2.11 

9.62 


5.87 

-0.7 

5.9 

Cysteine 

Cys C 

121 

1.96 

10.28 

8.18 

5.07 

2.5 

1.9 

Asparagine 

Asn N 

132 

2.02 

8.80 


5.41 

-3.5 

4.3 

Glutamine 

Gin Q 

146 

2.17 

9.13 


5.65 

-3.5 

4.2 

Positively charged 

R groups 

Lysine 

Lys K 

146 

2.18 

8.95 

10.53 

9.74 

-3.9 

5.9 

Histidine 

His H 

155 

1.82 

9.17 

6.00 

7.59 

-3.2 

2.3 

Arginine 

Arg R 

174 

2.17 

9.04 

12.48 

10.76 

-4.5 

5.1 

Negatively charged 

R groups 

Aspartate 

Asp D 

133 

1.88 

9.60 

3.65 

2.77 

-3.5 

5.3 

Glutamate 

Glu E 

147 

2.19 

9.67 

4.25 

3.22 

-3.5 

6.3 











*A scale combining hydrophobicity and hydrophilicity of R groups; it can be used to measure the tendency of an amino acid to seek an aqueous 
environment (— values) or a hydrophobic environment (+ values). See Chapter 11. From Kyte, J. & Doolittle, R.F (1982) A simple method for 
displaying the hydropathic character of a protein. J. Mol. Biol. 157 , 105-132. 

t Average occurrence in more than 1,150 proteins. From Doolittle, R.F (1989) Redundancies in protein sequences. In Prediction of Protein Struc¬ 
ture and the Principles of Protein Conformation (Fasman, G.D., ed.), pp. 599-623, Plenum Press, New York. 


Amino Acids Can Be Classified by R Group 

Knowledge of the chemical properties of the common 
amino acids is central to an understanding of biochem¬ 
istry. The topic can be simplified by grouping the amino 
acids into five main classes based on the properties of 
their R groups (Table 3-1), in particular, their polarity, 
or tendency to interact with water at biological pH (near 
pH 7.0). The polarity of the R groups varies widely, from 
nonpolar and hydrophobic (water-insoluble) to highly 
polar and hydrophilic (water-soluble). 

The structures of the 20 common amino acids are 
shown in Figure 3-5, and some of their properties are 


listed in Table 3-1. Within each class there are grada¬ 
tions of polarity, size, and shape of the R groups. 

Nonpolar, Aliphatic R Groups The R groups in this class of 
amino acids are nonpolar and hydrophobic. The side 
chains of alanine, valine, leucine, and isoleucine 

tend to cluster together within proteins, stabilizing pro¬ 
tein structure by means of hydrophobic interactions. 
Glycine has the simplest structure. Although it is for¬ 
mally nonpolar, its very small side chain makes no real 
contribution to hydrophobic interactions. Methionine, 
one of the two sulfur-containing amino acids, has a non¬ 
polar thioether group in its side chain. Proline has an 
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Polar, uncharged R groups 

COO- 

COO- 


coo- 

+ 
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H 3 N—C—H 
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ch 2 oh 

H—C—OH 


ch 2 


ch 3 


SH 
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COO 
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ch 2 
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FIGURE 3-5 The 20 common amino acids of proteins. The structural 
formulas show the state of ionization that would predominate at pH 
7.0. The unshaded portions are those common to all the amino acids; 
the portions shaded in red are the R groups. Although the R group of 

aliphatic side chain with a distinctive cyclic structure. The 
secondary amino (imino) group of proline residues is 
held in a rigid conformation that reduces the structural 
flexibility of polypeptide regions containing proline. 

Aromatic R Groups Phenylalanine, tyrosine, and tryp¬ 
tophan, with their aromatic side chains, are relatively 
nonpolar (hydrophobic). All can participate in hy¬ 
drophobic interactions. The hydroxyl group of tyrosine 
can form hydrogen bonds, and it is an important func¬ 


histidine is shown uncharged, its p K a (see Table 3-1) is such that a 
small but significant fraction of these groups are positively charged at 
pH 7.0. 


tional group in some enzymes. Tyrosine and tryptophan 
are significantly more polar than phenylalanine, because 
of the tyrosine hydroxyl group and the nitrogen of the 
tryptophan indole ring. 

Tryptophan and tyrosine, and to a much lesser ex¬ 
tent phenylalanine, absorb ultraviolet light (Fig. 3-6; 
Box 3-1). This accounts for the characteristic strong ab¬ 
sorbance of light by most proteins at a wavelength of 
280 nm, a property exploited by researchers in the char¬ 
acterization of proteins. 
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Polar, Uncharged R Groups The R groups of these amino 
acids are more soluble in water, or more hydrophilic, 
than those of the nonpolar amino acids, because they 
contain functional groups that form hydrogen bonds 
with water. This class of amino acids includes serine, 
threonine, cysteine, asparagine, and glutamine. 
The polarity of serine and threonine is contributed by 
their hydroxyl groups; that of cysteine by its sulfhydryl 
group; and that of asparagine and glutamine by their 
amide groups. 

Asparagine and glutamine are the amides of two 
other amino acids also found in proteins, aspartate and 
glutamate, respectively, to which asparagine and gluta¬ 
mine are easily hydrolyzed by acid or base. Cysteine is 
readily oxidized to form a covalently linked dimeric 
amino acid called cystine, in which two cysteine mole¬ 
cules or residues are joined by a disulfide bond (Fig- 
3-7). The disulfide-linked residues are strongly hy¬ 
drophobic (nonpolar). Disulfide bonds play a special 
role in the structures of many proteins by forming co¬ 
valent links between parts of a protein molecule or be¬ 
tween two different polypeptide chains. 

Positively Charged (Basic) R Groups The most hydrophilic 
R groups are those that are either positively or nega¬ 
tively charged. The amino acids in which the R groups 
have significant positive charge at pH 7.0 are lysine, 
which has a second primary amino group at the e posi- 



Wavelength (nm) 


FIGURE 3-6 Absorption of ultraviolet light by aromatic amino acids. 

Comparison of the light absorption spectra of the aromatic amino acids 
tryptophan and tyrosine at pH 6.0. The amino acids are present in 
equimolar amounts (10 -3 m) under identical conditions. The meas¬ 
ured absorbance of tryptophan is as much as four times that of tyro¬ 
sine. Note that the maximum light absorption for both tryptophan and 
tyrosine occurs near a wavelength of 280 nm. Light absorption by the 
third aromatic amino acid, phenylalanine (not shown), generally con¬ 
tributes little to the spectroscopic properties of proteins. 
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FIGURE 3-7 Reversible formation of a disulfide bond by the oxida¬ 
tion of two molecules of cysteine. Disulfide bonds between Cys 
residues stabilize the structures of many proteins. 


tion on its aliphatic chain; arginine, which has a posi¬ 
tively charged guanidino group; and histidine, which 
has an imidazole group. Histidine is the only common 
amino acid having an ionizable side chain with a p K a 
near neutrality. In many enzyme-catalyzed reactions, a 
His residue facilitates the reaction by serving as a pro¬ 
ton donor/acceptor. 

Negatively Charged (Acidic) R Groups The two amino acids 
having R groups with a net negative charge at pH 7.0 
are aspartate and glutamate, each of which has a sec¬ 
ond carboxyl group. 

Uncommon Amino Acids Also Have 
Important Functions 

In addition to the 20 common amino acids, proteins 
may contain residues created by modification of com¬ 
mon residues already incorporated into a polypeptide 
(Fig. 3-8a). Among these uncommon amino acids 
are 4-hydroxyproline, a derivative of proline, and 
5-hydroxylysine, derived from lysine. The former is 
found in plant cell wall proteins, and both are found in 
collagen, a fibrous protein of comrective tissues. 6 -N- 
Methyllysine is a constituent of myosin, a contractile 
protein of muscle. Another important uncommon amino 
acid is y-carboxyglutamate, found in the blood¬ 
clotting protein prothrombin and in certain other pro¬ 
teins that bind Ca 2+ as part of their biological function. 
More complex is desmosine, a derivative of four Lys 
residues, which is found in the fibrous protein elastin. 

Selenocysteine is a special case. This rare amino 
acid residue is introduced during protein synthesis 
rather than created through a postsynthetic modifica¬ 
tion. It contains selenium rather than the sulfur of cys¬ 
teine. Actually derived from serine, selenocysteine is a 
constituent of just a few known proteins. 

Some 300 additional amino acids have been found 
in cells. They have a variety of functions but are not 
constituents of proteins. Ornithine and citrulline 
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FIGURE 3-8 Uncommon amino acids, (a) Some uncommon amino 
acids found in proteins. All are derived from common amino acids. 
Extra functional groups added by modification reactions are shown in 
red. Desmosine is formed from four Lys residues (the four carbon back¬ 
bones are shaded in yellow). Note the use of either numbers or Greek 
letters to identify the carbon atoms in these structures, (b) Ornithine 
and citrulline, which are not found in proteins, are intermediates in 
the biosynthesis of arginine and in the urea cycle. 
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FIGURE 3-9 Nonionic and zwitterionic forms of amino acids. The 

nonionic form does not occur in significant amounts in aqueous so¬ 
lutions. The zwitterion predominates at neutral pH. 
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Amino Acids Can Act as Acids and Bases 

When an amino acid is dissolved in water, it exists in so¬ 
lution as the dipolar ion, or zwitterion (German for 
“hybrid ion”), shown in Figure 3-9. A zwitterion can act 
as either an acid (proton donor): 
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or a base (proton acceptor): 
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Substances having this dual nature are amphoteric 
and are often called ampholytes (from “amphoteric 
electrolytes”). A simple monoamino monocarboxylic a- 
amino acid, such as alanine, is a diprotic acid when fully 
protonated—it has two groups, the —COOH group and 
the —NHj group, that can yield protons: 


(Fig. 3-8b) deserve special note because they are key 
intermediates (metabolites) in the biosynthesis of argi¬ 
nine (Chapter 22) and in the urea cycle (Chapter 18). 
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BOX 3-1 WORKING IN BIOCHEMISTRY 


Absorption of Light by Molecules: 

The Lambert-Beer Law 

A wide range of biomolecules absorb light at charac¬ 
teristic wavelengths, just as tryptophan absorbs light at 
280 run (see Fig. 3-6). Measurement of light absorp¬ 
tion by a spectrophotometer is used to detect and iden¬ 
tify molecules and to measure their concentration in 
solution. The fraction of the incident light absorbed by 
a solution at a given wavelength is related to the thick¬ 
ness of the absorbing layer (path length) and the con¬ 
centration of the absorbing species (Fig. 1). These two 
relationships are combined into the Lambert-Beer law, 


where I 0 is the intensity of the incident light, I is the in¬ 
tensity of the transmitted light, s is the molar extinc¬ 
tion coefficient (in units of liters per mole-centimeter), 
c is the concentration of the absorbing species (in 


moles per liter), and l is the path length of the light¬ 
absorbing sample (in centimeters). The Lambert-Beer 
law assumes that the incident light is parallel and 
monochromatic (of a single wavelength) and that the 
solvent and solute molecules are randomly oriented. 
The expression log (/ 0 //) is called the absorbance, 
designated A. 

It is important to note that each successive milli¬ 
meter of path length of absorbing solution in a 1.0 cm 
cell absorbs not a constant amount but a constant frac¬ 
tion of the light that is incident upon it. However, with 
an absorbing layer of fixed path length, the ab¬ 
sorbance, A, is directly proportional to the con¬ 
centration of the absorbing solute. 

The molar extinction coefficient varies with the 
nature of the absorbing compound, the solvent, and 
the wavelength, and also with pH if the light-absorbing 
species is in equilibrium with an ionization state that 
has different absorbance properties. 


FIGURE 1 The principal components of a 
spectrophotometer. A light source emits 
light along a broad spectrum, then the 
monochromator selects and transmits light 
of a particular wavelength. The monochro¬ 
matic light passes through the sample in a 
cuvette of path length / and is absorbed by 
the sample in proportion to the concentra¬ 
tion of the absorbing species. The transmit¬ 
ted light is measured by a detector. 
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Amino Acids Have Characteristic Titration Curves 

Acid-base titration involves the gradual addition or re¬ 
moval of protons (Chapter 2). Figure 3-10 shows the 
titration curve of the diprotic form of glycine. The plot 
has two distinct stages, corresponding to deprotonation 
of two different groups on glycine. Each of the two 
stages resembles in shape the titration curve of a 
monoprotic acid, such as acetic acid (see Fig. 2-17), 
and can be analyzed in the same way. At very low pH, 
the predominant ionic species of glycine is the fully pro- 
tonated form, + H 3 N—CH 2 —COOH. At the midpoint in 
the first stage of the titration, in which the —COOH 
group of glycine loses its proton, equimolar concentra¬ 
tions of the proton-donor ( + H 3 N—CH 2 —COOH) and 
proton-acceptor ( + H 3 N—CH 2 —COO - ) species are 
present. At the midpoint of any titration, a point of in¬ 
flection is reached where the pH is equal to the p K a of 
the protonated group being titrated (see Fig. 2-18). For 
glycine, the pH at the midpoint is 2.34, thus its —COOH 
group has a p K a (labeled p K 1 in Fig. 3-10) of 2.34. 


(Recall from Chapter 2 that pH and p K a are simply con¬ 
venient notations for proton concentration and the 
equilibrium constant for ionization, respectively. The 
p K a is a measure of the tendency of a group to give up 
a proton, with that tendency decreasing tenfold as the 
p K a increases by one unit.) As the titration proceeds, 
another important point is reached at pH 5.97. Here 
there is another point of inflection, at which removal of 
the first proton is essentially complete and removal of 
the second has just begun. At this pH glycine is 
present largely as the dipolar ion + H 3 N—CH 2 —COO - . 
We shall return to the significance of this inflection 
point in the titration curve (labeled pi in Fig. 3-10) 
shortly. 

The second stage of the titration corresponds to the 
removal of a proton from the —NHj group of glycine. 
The pH at the midpoint of this stage is 9.60, equal to 
the p K a (labeled p K 2 in Fig. 3-10) for the —NHj group. 
The titration is essentially complete at a pH of about 12, 
at which point the predominant form of glycine is 
H 2 N—CH 2 —COO - . 
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FIGURE 3-10 Titration of an amino acid. Shown here is the titration 
curve of 0.1 m glycine at 25 °C. The ionic species predominating at 
key points in the titration are shown above the graph. The shaded 
boxes, centered at about pff, = 2.34 and p K 2 = 9.60, indicate the re¬ 
gions of greatest buffering power. 


From the titration curve of glycine we can derive 
several important pieces of information. First, it gives a 
quantitative measure of the p K a of each of the two ion¬ 
izing groups: 2.34 for the —COOH group and 9.60 for 
the —NHj group. Note that the carboxyl group of 
glycine is over 100 times more acidic (more easily ion¬ 
ized) than the carboxyl group of acetic acid, which, as 
we saw in Chapter 2, has a p K a of 4.76—about average 
for a carboxyl group attached to an otherwise unsub¬ 
stituted aliphatic hydrocarbon. The perturbed p K a of 
glycine is caused by repulsion between the departing 
proton and the nearby positively charged amino group 
on the a-carbon atom, as described in Figure 3-11. The 
opposite charges on the resulting zwitterion are stabi¬ 
lizing, nudging the equilibrium farther to the right. Sim¬ 
ilarly, the p K a of the amino group in glycine is perturbed 
downward relative to the average p K a of an amino group. 
This effect is due partly to the electronegative oxygen 
atoms in the carboxyl groups, which tend to pull elec¬ 
trons toward them, increasing the tendency of the amino 
group to give up a proton. Hence, the a-amino group 
has a p K a that is lower than that of an aliphatic amine 
such as methylamine (Fig. 3-11). In short, the p K a of 
any functional group is greatly affected by its chemical 
environment, a phenomenon sometimes exploited in the 
active sites of enzymes to promote exquisitely adapted 
reaction mechanisms that depend on the perturbed p K a 
values of proton donor/acceptor groups of specific 
residues. 
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FIGURE 3-11 Effect of the chemical environment on p/f a . The pK a 

values for the ionizable groups in glycine are lower than those for sim¬ 
ple, methyl-substituted amino and carboxyl groups. These downward 


perturbations of p/C a are due to intramolecular interactions. Similar ef¬ 
fects can be caused by chemical groups that happen to be positioned 
nearby—for example, in the active site of an enzyme. 
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The second piece of information provided by the 
titration curve of glycine is that this amino acid has two 
regions of buffering power. One of these is the relatively 
flat portion of the curve, extending for approximately 
1 pH unit on either side of the first p K a of 2.34, indi¬ 
cating that glycine is a good buffer near this pH. The 
other buffering zone is centered around pH 9.60. (Note 
that glycine is not a good buffer at the pH of intracel¬ 
lular fluid or blood, about 7.4.) Within the buffering 
ranges of glycine, the Henderson-Hasselbalch equation 
(see Box 2-3) can be used to calculate the proportions 
of proton-donor and proton-acceptor species of glycine 
required to make a buffer at a given pH. 

Titration Curves Predict the Electric Charge 
of Amino Acids 

Another important piece of information derived from 
the titration curve of an amino acid is the relationship 
between its net electric charge and the pH of the solu¬ 
tion. At pH 5.97, the point of inflection between the 
two stages in its titration curve, glycine is present pre¬ 
dominantly as its dipolar form, fully ionized but with no 
net electric charge (Fig. 3-10). The characteristic pH 
at which the net electric charge is zero is called the 
isoelectric point or isoelectric pH, designated pi. 
For glycine, which has no ionizable group in its side 
chain, the isoelectric point is simply the arithmetic mean 
of the two p K a values: 

pi = j (pK i + pK 2 ) = j (2.34 + 9.60) = 5.97 

As is evident in Figure 3-10, glycine has a net negative 
charge at any pH above its pi and will thus move toward 
the positive electrode (the anode) when placed in an 
electric field. At any pH below its pi, glycine has a net 
positive charge and will move toward the negative elec¬ 
trode (the cathode). The farther the pH of a glycine so¬ 
lution is from its isoelectric point, the greater the net 
electric charge of the population of glycine molecules. 
At pH 1.0, for example, glycine exists almost entirely as 
the form + H 3 N—CH 2 —COOH, with a net positive 
charge of 1.0. At pH 2.34, where there is an equal mix¬ 
ture of + H 3 N—CH 2 —COOH and + H 3 N—CH 2 —COO“, 
the average or net positive charge is 0.5. The sign and 
the magnitude of the net charge of any amino acid at 
any pH can be predicted in the same way. 

Amino Acids Differ in Their Acid-Base Properties 

The shared properties of many amino acids permit some 
simplifying generalizations about their acid-base behav¬ 
iors. First, all amino acids with a single a-amino group, 
a single a-carboxyl group, and an R group that does not 
ionize have titration curves resembling that of glycine 
(Fig. 3-10). These amino acids have very similar, al¬ 
though not identical, p K a values: pK a of the —COOH 


group in the range of 1.8 to 2.4, and p K a of the —NHj 
group in the range of 8.8 to 11.0 (Table 3-1). 

Second, amino acids with an ionizable R group have 
more complex titration curves, with three stages corre¬ 
sponding to the three possible ionization steps; thus 
they have three p K a values. The additional stage for the 
titration of the ionizable R group merges to some extent 
with the other two. The titration curves for two amino 
acids of this type, glutamate and histidine, are shown in 
Figure 3-12. The isoelectric points reflect the nature of 
the ionizing R groups present. For example, glutamate 
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FIGURE 3-12 Titration curves for (a) glutamate and (b) histidine. The 
p/C a of the R group is designated here as pK R . 
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has a pi of 3.22, considerably lower than that of glycine. 
This is due to the presence of two carboxyl groups, 
which, at the average of their p K a values (3.22), con¬ 
tribute a net charge of —1 that balances the +1 con¬ 
tributed by the amino group. Similarly, the pi of histi¬ 
dine, with two groups that are positively charged when 
protonated, is 7.59 (the average of the p K a values of the 
amino and imidazole groups), much higher than that of 
glycine. 

Finally, as pointed out earlier, under the general 
condition of free and open exposure to the aqueous en¬ 
vironment, only histidine has an R group (p K a = 6.0) 
providing significant buffering power near the neutral 
pH usually found in the intracellular and extracellular 
fluids of most animals and bacteria (Table 3-1). 

SUMMARY 3.1 Amino Acids 


■ The 20 amino acids commonly found as 
residues in proteins contain an a-carboxyl 
group, an a-amino group, and a distinctive R 
group substituted on the a-carbon atom. The 
a-carbon atom of all amino acids except glycine 
is asymmetric, and thus amino acids can exist 
in at least two stereoisomeric forms. Only the 

l stereoisomers, with a configuration related to 
the absolute configuration of the reference 
molecule L-glyceraldehyde, are found in 
proteins. 

■ Other, less common amino acids also occur, 
either as constituents of proteins (through 
modification of common amino acid residues 
after protein synthesis) or as free metabolites. 

■ Amino acids are classified into five types on the 
basis of the polarity and charge (at pH 7) of 
their R groups. 

■ Amino acids vary in their acid-base properties 
and have characteristic titration curves. 
Monoamino monocarboxylic amino acids (with 
nonionizable R groups) are diprotic acids 
( + H 3 NCH(R)COOH) at low pH and exist in 
several different ionic forms as the pH is 
increased. Amino acids with ionizable R groups 
have additional ionic species, depending on the 
pH of the medium and the p K a of the R group. 


3.2 Peptides and Proteins 

We now turn to polymers of amino acids, the peptides 
and proteins. Biologically occurring polypeptides range 
in size from small to very large, consisting of two or 
three to thousands of linked amino acid residues. Our 
focus is on the fundamental chemical properties of these 
polymers. 


Peptides Are Chains of Amino Acids 

Two amino acid molecules can be covalently joined 
through a substituted amide linkage, termed a peptide 
bond, to yield a dipeptide. Such a linkage is formed by 
removal of the elements of water (dehydration) from 
the a-carboxyl group of one amino acid and the a-amino 
group of another (Fig. 3-13). Peptide bond formation is 
an example of a condensation reaction, a common class 
of reactions in living cells. Under standard biochemical 
conditions, the equilibrium for the reaction shown in Fig¬ 
ure 3-13 favors the amino acids over the dipeptide. To 
make the reaction thermodynamically more favorable, 
the carboxyl group must be chemically modified or ac¬ 
tivated so that the hydroxyl group can be more readily 
eliminated. A chemical approach to this problem is out¬ 
lined later in this chapter. The biological approach to 
peptide bond formation is a major topic of Chapter 27. 

Three amino acids can be joined by two peptide 
bonds to form a tripeptide; similarly, amino acids can be 
linked to form tetrapeptides, pentapeptides, and so 
forth. When a few amino acids are joined in this fash¬ 
ion, the structure is called an oligopeptide. When many 
amino acids are joined, the product is called a polypep¬ 
tide. Proteins may have thousands of amino acid 
residues. Although the terms “protein” and “polypep¬ 
tide” are sometimes used interchangeably, molecules re¬ 
ferred to as polypeptides generally have molecular 
weights below 10,000, and those called proteins have 
higher molecular weights. 

Figure 3-14 shows the structure of a pentapeptide. 
As already noted, an amino acid unit in a peptide is often 
called a residue (the part left over after losing a hydro¬ 
gen atom from its amino group and the hydroxyl moi¬ 
ety from its carboxyl group). In a peptide, the amino 
acid residue at the end with a free a-amino group is the 
amino-terminal (or Af-terminal) residue; the residue 
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FIGURE 3-13 Formation of a peptide bond by condensation. The a- 

amino group of one amino acid (with R 2 group) acts as a nucleophile 
to displace the hydroxyl group of another amino acid (with R 1 group), 
forming a peptide bond (shaded in yellow). Amino groups are good 
nucleophiles, but the hydroxyl group is a poor leaving group and is 
not readily displaced. At physiological pH, the reaction shown does 
not occur to any appreciable extent. 
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FIGURE 3-14 The pentapeptide serylglycyltyrosylalanylleucine, or 
Ser-Gly-Tyr-Ala-Leu. Peptides are named beginning with the amino- 
terminal residue, which by convention is placed at the left. The pep¬ 
tide bonds are shaded in yellow; the R groups are in red. 


at the other end, which has a free carboxyl group, is the 
carboxyl-terminal (C-terminal) residue. 

Although hydrolysis of a peptide bond is an exer- 
gonic reaction, it occurs slowly because of its high acti¬ 
vation energy. As a result, the peptide bonds in proteins 
are quite stable, with an average half-life (f 1/2 ) of about 
7 years under most intracellular conditions. 


ever, the R groups of some amino acids can ionize (Table 
3-1), and in a peptide these contribute to the overall 
acid-base properties of the molecule (Fig. 3-15). Thus 
the acid-base behavior of a peptide can be predicted 
from its free a-amino and a-carboxyl groups as well as 
the nature and number of its ionizable R groups. 

Like free amino acids, peptides have characteristic 
titration curves and a characteristic isoelectric pH (pi) 
at which they do not move in an electric field. These 
properties are exploited in some of the techniques used 
to separate peptides and proteins, as we shall see later 
in the chapter. It should be emphasized that the p K a 
value for an ionizable R group can change somewhat 
when an amino acid becomes a residue in a peptide. The 
loss of charge in the a-carboxyl and a-amino groups, 
the interactions with other peptide R groups, and other 
environmental factors can affect the p K a . The p K a val¬ 
ues for R groups listed in Table 3-1 can be a useful guide 
to the pH range in which a given group will ionize, but 
they cannot be strictly applied to peptides. 

Biologically Active Peptides and Polypeptides 
Occur in a Vast Range of Sizes 


Peptides Can Be Distinguished by Their 
Ionization Behavior 

Peptides contain only one free a-amino group and one 
free a-carboxyl group, at opposite ends of the chain 
(Fig. 3-15). These groups ionize as they do in free amino 
acids, although the ionization constants are different be¬ 
cause an oppositely charged group is no longer linked 
to the a carbon. The a-amino and a-carboxyl groups of 
all nonterminal amino acids are covalently joined in the 
peptide bonds, which do not ionize and thus do not con¬ 
tribute to the total acid-base behavior of peptides. How- 
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FIGURE 3-15 Alanylglutamylglycyllysine. This tetrapeptide has one 
free a-amino group, one free a-carboxyl group, and two ionizable R 
groups. The groups ionized at pH 7.0 are in red. 


No generalizations can be made about the molecular 
weights of biologically active peptides and proteins in re¬ 
lation to their functions. Naturally occurring peptides 
range in length from two to many thousands of amino 
acid residues. Even the smallest peptides can have bio¬ 
logically important effects. Consider the commercially 
synthesized dipeptide L-aspartyl-L-phenylalanine methyl 
ester, the artificial sweetener better known as aspartame 
or NutraSweet. 


ch 2 o 

H 3 N—CH—C 

H 

L-Aspartyl-L-phenylalanine methyl ester 
(aspartame) 

Many small peptides exert their effects at very low 
concentrations. For example, a number of vertebrate 
hormones (Chapter 23) are small peptides. These in¬ 
clude oxytocin (nine amino acid residues), which is se¬ 
creted by the posterior pituitary and stimulates uterine 
contractions; bradykinin (nine residues), which inhibits 
inflammation of tissues; and thyrotropin-releasing fac¬ 
tor (three residues), which is formed in the hypothala¬ 
mus and stimulates the release of another hormone, 
thyrotropin, from the anterior pituitary gland. Some 
extremely toxic mushroom poisons, such as amanitin, 
are also small peptides, as are many antibiotics. 

Slightly larger are small polypeptides and oligopep¬ 
tides such as the pancreatic hormone insulin, which con¬ 
tains two polypeptide chains, one having 30 amino acid 
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residues and the other 21. Glucagon, another pancre¬ 
atic hormone, has 29 residues; it opposes the action of 
insulin. Corticotropin is a 39-residue hormone of the an¬ 
terior pituitary gland that stimulates the adrenal cortex. 

How long are the polypeptide chains in proteins? As 
Table 3-2 shows, lengths vary considerably. Human cyto¬ 
chrome c has 104 amino acid residues linked in a single 
chain; bovine chymotrypsinogen has 245 residues. At 
the extreme is titin, a constituent of vertebrate muscle, 
which has nearly 27,000 amino acid residues and a mo¬ 
lecular weight of about 3,000,000. The vast majority of 
naturally occurring proteins are much smaller than this, 
containing fewer than 2,000 amino acid residues. 

Some proteins consist of a single polypeptide chain, 
but others, called multisubunit proteins, have two or 
more polypeptides associated noncovalently (Table 
3-2). The individual polypeptide chains in a multisub¬ 
unit protein may be identical or different. If at least two 
are identical the protein is said to be oligomeric, and 
the identical units (consisting of one or more polypep¬ 
tide chains) are referred to as protomers. Hemoglobin, 
for example, has four polypeptide subunits: two 
identical a chains and two identical j3 chains, all four 
held together by noncovalent interactions. Each a sub¬ 
unit is paired in an identical way with a (3 subunit within 
the structure of this multisubunit protein, so that he¬ 
moglobin can be considered either a tetramer of four 
polypeptide subunits or a dimer of a/3 protomers. 

A few proteins contain two or more polypeptide 
chains linked covalently. For example, the two polypep¬ 
tide chains of insulin are linked by disulfide bonds. In 
such cases, the individual polypeptides are not consid¬ 
ered subunits but are commonly referred to simply as 
chains. 

We can calculate the approximate number of amino 
acid residues in a simple protein containing no other 


chemical constituents by dividing its molecular weight 
by 110. Although the average molecular weight of the 
20 common amino acids is about 138, the smaller amino 
acids predominate in most proteins. If we take into ac¬ 
count the proportions in which the various amino acids 
occur in proteins (Table 3-1), the average molecular 
weight of protein amino acids is nearer to 128. Because 
a molecule of water (M r 18) is removed to create each 
peptide bond, the average molecular weight of an amino 
acid residue in a protein is about 128 — 18 = 110. 

Polypeptides Have Characteristic 
Amino Acid Compositions 

Hydrolysis of peptides or proteins with acid yields a mix¬ 
ture of free a-amino acids. When completely hydrolyzed, 
each type of protein yields a characteristic proportion 
or mixture of the different amino acids. The 20 common 
amino acids almost never occur in equal amounts in a 
protein. Some amino acids may occur only once or not 
at all in a given type of protein; others may occur in 
large numbers. Table 3-3 shows the composition of the 
amino acid mixtures obtained on complete hydrolysis of 
bovine cytochrome c and chymotrypsinogen, the inac¬ 
tive precursor of the digestive enzyme chymotrypsin. 
These two proteins, with very different functions, also 
differ significantly in the relative numbers of each kind 
of amino acid they contain. 

Complete hydrolysis alone is not sufficient for an 
exact analysis of amino acid composition, however, be¬ 
cause some side reactions occur during the procedure. 
For example, the amide bonds in the side chains of as¬ 
paragine and glutamine are cleaved by acid treatment, 
yielding aspartate and glutamate, respectively. The side 
chain of tryptophan is almost completely degraded by 
acid hydrolysis, and small amounts of serine, threonine, 


TABLE 3-2 Molecular Data on Some Proteins 



Molecular 

weight 

Number of 

residues 

Number of 
polypeptide chains 

Cytochrome c (human) 

13,000 

104 

1 

Ribonuclease A (bovine pancreas) 

13,700 

124 

1 

Lysozyme (chicken egg white) 

13,930 

129 

1 

Myoglobin (equine heart) 

16,890 

153 

1 

Chymotrypsin (bovine pancreas) 

21,600 

241 

3 

Chymotrypsinogen (bovine) 

22,000 

245 

1 

Hemoglobin (human) 

64,500 

574 

4 

Serum albumin (human) 

68,500 

609 

1 

Hexokinase (yeast) 

102,000 

972 

2 

RNA polymerase (£. coli) 

450,000 

4,158 

5 

Apolipoprotein B (human) 

513,000 

4,536 

1 

Glutamine synthetase (E coli) 

619,000 

5,628 

12 

Titin (human) 

2,993,000 

26,926 

1 
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TABLE 3-3 Amino Acid Composition of 
Two Proteins 


Number of residues 
per molecule of protein * 


Amino 

acid 

Bovine 

cytochrome c 

Bovine 

chymotrypsinogen 

Ala 

6 

22 

Arg 

2 

4 

Asn 

5 

15 

Asp 

3 

8 

Cys 

2 

10 

Gin 

3 

10 

Glu 

9 

5 

Gly 

14 

23 

His 

3 

2 

lie 

6 

10 

Leu 

6 

19 

Lys 

18 

14 

Met 

2 

2 

Phe 

4 

6 

Pro 

4 

9 

Ser 

1 

28 

Thr 

8 

23 

Trp 

1 

8 

Tyr 

4 

4 

Val 

3 

23 

Total 

104 

245 


*ln some common analyses, such as acid hydrolysis, Asp and Asn are not readily distin¬ 
guished from each other and are together designated Asx (or B). Similarly, when Glu and 
Gin cannot be distinguished, they are together designated Glx (or Z). In addition, Trp is 
destroyed. Additional procedures must be employed to obtain an accurate assessment of 
complete amino acid content. 

and tyrosine are also lost. When a precise amino acid 
composition is required, biochemists use additional pro¬ 
cedures to resolve the ambiguities that remain from acid 
hydrolysis. 

Some Proteins Contain Chemical Groups 
Other Than Amino Acids 

Many proteins, for example the enzymes ribonuclease 
A and chymotrypsinogen, contain only amino acid 
residues and no other chemical constituents; these are 
considered simple proteins. However, some proteins 
contain permanently associated chemical components 
in addition to amino acids; these are called conjugated 
proteins. The non-amino acid part of a conjugated pro¬ 
tein is usually called its prosthetic group. Conjugated 
proteins are classified on the basis of the chemical na¬ 
ture of their prosthetic groups (Table 3-4); for exam¬ 
ple, lipoproteins contain lipids, glycoproteins contain 
sugar groups, and metalloproteins contain a specific 


TABLE 3-4 Conjugated Proteins 


Class 

Prosthetic group 

Example 

Lipoproteins 

Lipids 

jSi-Lipoprotein 
of blood 

Glycoproteins 

Carbohydrates 

Immunoglobulin G 

Phosphoproteins 

Phosphate groups 

Casein of milk 

Hemoproteins 

Heme (iron porphyrin) 

Hemoglobin 

Flavoproteins 

Flavin nucleotides 

Succinate 

dehydrogenase 

Metalloproteins 

Iron 

Ferritin 


Zinc 

Alcohol 

dehydrogenase 


Calcium 

Calmodulin 


Molybdenum 

Dinitrogenase 


Copper 

Plastocyanin 


metal. A number of proteins contain more than one pros¬ 
thetic group. Usually the prosthetic group plays an im¬ 
portant role in the protein’s biological function. 

There Are Several Levels of Protein Structure 

For large macromolecules such as proteins, the tasks of 
describing and understanding structure are approached 
at several levels of complexity, arranged in a kind of con¬ 
ceptual hierarchy. Four levels of protein structure are 
commonly defined (Fig. 3-16). A description of all co¬ 
valent bonds (mainly peptide bonds and disulfide 
bonds) linking amino acid residues in a polypeptide 
chain is its primary structure. The most important el¬ 
ement of primary structure is the sequence of amino 
acid residues. Secondary structure refers to particu¬ 
larly stable arrangements of amino acid residues giving 
rise to recurring structural patterns. Tertiary struc¬ 
ture describes all aspects of the three-dimensional fold¬ 
ing of a polypeptide. When a protein has two or more 
polypeptide subunits, their arrangement in space is re¬ 
ferred to as quaternary structure. Primary structure 
is the focus of Section 3.4; the higher levels of structure 
are discussed in Chapter 4. 

SUMMARY 3.2 Peptides and Proteins 


■ Amino acids can be joined covalently through 
peptide bonds to form peptides and proteins. 
Cells generally contain thousands of different 
proteins, each with a different biological activity. 

■ Proteins can be very long polypeptide chains of 
100 to several thousand amino acid residues. 
However, some naturally occurring peptides 
have only a few amino acid residues. Some 
proteins are composed of several noncovalently 
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FIGURE 3-16 Levels of structure in proteins. The primary structure 
consists of a sequence of amino acids linked together by peptide bonds 
and includes any disulfide bonds. The resulting polypeptide can be 
coiled into units of secondary structure, such as an a helix. The he¬ 


lix is a part of the tertiary structure of the folded polypeptide, which 
is itself one of the subunits that make up the quaternary structure of 
the multisubunit protein, in this case hemoglobin. 


associated polypeptide chains, called subunits. 
Simple proteins yield only amino acids on 
hydrolysis; conjugated proteins contain in 
addition some other component, such as a 
metal or organic prosthetic group. 

■ The sequence of amino acids in a protein is 
characteristic of that protein and is called its 
primary structure. This is one of four generally 
recognized levels of protein structure. 


3.3 Working with Proteins 

Our understanding of protein structure and function has 
been derived from the study of many individual proteins. 
To study a protein in detail, the researcher must be able 
to separate it from other proteins and must have the 
techniques to determine its properties. The necessary 
methods come from protein chemistry, a discipline as 
old as biochemistry itself and one that retains a central 
position in biochemical research. 

Proteins Can Be Separated and Purified 

A pure preparation is essential before a protein’s prop¬ 
erties and activities can be determined. Given that cells 
contain thousands of different kinds of proteins, how 
can one protein be purified? Methods for separating pro¬ 
teins take advantage of properties that vary from one 
protein to the next, including size, charge, and binding 
properties. 

The source of a protein is generally tissue or mi¬ 
crobial cells. The first step in any protein purification 
procedure is to break open these cells, releasing their 
proteins into a solution called a crude extract. If nec¬ 
essary, differential centrifugation can be used to pre¬ 


pare subcellular fractions or to isolate specific or¬ 
ganelles (see Fig. 1-8). 

Once the extract or organelle preparation is ready, 
various methods are available for purifying one or more 
of the proteins it contains. Commonly, the extract is sub¬ 
jected to treatments that separate the proteins into dif¬ 
ferent fractions based on a property such as size or 
charge, a process referred to as fractionation. Early 
fractionation steps in a purification utilize differences in 
protein solubility, which is a complex function of pH, 
temperature, salt concentration, and other factors. The 
solubility of proteins is generally lowered at high salt 
concentrations, an effect called “salting out.” The addi¬ 
tion of a salt in the right amount can selectively pre¬ 
cipitate some proteins, while others remain in solution. 
Ammonium sulfate ((NH 4 ) 2 SC) 4 ) is often used for this 
purpose because of its high solubility in water. 

A solution containing the protein of interest often 
must be further altered before subsequent purification 
steps are possible. For example, dialysis is a procedure 
that separates proteins from solvents by taking advan¬ 
tage of the proteins’ larger size. The partially purified 
extract is placed in a bag or tube made of a semiper- 
meable membrane. When this is suspended in a much 
larger volume of buffered solution of appropriate ionic 
strength, the membrane allows the exchange of salt and 
buffer but not proteins. Thus dialysis retains large pro¬ 
teins within the membranous bag or tube while allow¬ 
ing the concentration of other solutes in the protein 
preparation to change until they come into equilibrium 
with the solution outside the membrane. Dialysis might 
be used, for example, to remove ammonium sulfate from 
the protein preparation. 

The most powerful methods for fractionating pro¬ 
teins make use of column chromatography, which 
takes advantage of differences in protein charge, size, 
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binding affinity, and other properties (Fig. 3-17). A 
porous solid material with appropriate chemical prop¬ 
erties (the stationary phase) is held in a column, and a 
buffered solution (the mobile phase) percolates through 
it. The protein-containing solution, layered on the top 
of the column, percolates through the solid matrix as an 
ever-expanding band within the larger mobile phase 
(Fig. 3-17). Individual proteins migrate faster or more 
slowly through the column depending on their proper¬ 
ties. For example, in cation-exchange chromatogra¬ 
phy (Fig. 3-18a), the solid matrix has negatively 
charged groups. In the mobile phase, proteins with a net 
positive charge migrate through the matrix more slowly 
than those with a net negative charge, because the mi¬ 
gration of the former is retarded more by interaction 
with the stationary phase. The two types of protein can 
separate into two distinct bands. The expansion of the 
protein band in the mobile phase (the protein solution) 
is caused both by separation of proteins with different 
properties and by diffusional spreading. As the length 
of the column increases, the resolution of two types of 
protein with different net charges generally improves. 
However, the rate at which the protein solution can flow 
through the column usually decreases with column 


Reservoir 


Protein 
sample 
(mobile 
phase) 

Solid 
porous 
matrix 
(stationary 
phase) 

Porous 
support 


Effluent 



Proteins 


length. And as the length of time spent on the column 
increases, the resolution can decline as a result of dif¬ 
fusional spreading within each protein band. 

Figure 3-18 shows two other variations of column 
chromatography in addition to ion exchange. Size- 
exclusion chromatography separates proteins ac¬ 
cording to size. In this method, large proteins emerge 
from the column sooner than small ones—a somewhat 
counterintuitive result. The solid phase consists of 
beads with engineered pores or cavities of a particular 
size. Large proteins cannot enter the cavities, and so 
take a short (and rapid) path through the column, 
around the beads. Small proteins enter the cavities, and 
migrate through the column more slowly as a result (Fig. 
3-18b). Affinity chromatography is based on the 
binding affinity of a protein. The beads in the column 
have a covalently attached chemical group. A protein 
with affinity for this particular chemical group will bind 
to the beads in the column, and its migration will be re¬ 
tarded as a result (Fig. 3-18c). 

A modern refinement in chromatographic methods 
is HPLC, or high-performance liquid chromatogra¬ 
phy. HPLC makes use of high-pressure pumps that 
speed the movement of the protein molecules down the 
column, as well as higher-quality chromatographic ma¬ 
terials that can withstand the crushing force of the pres¬ 
surized flow. By reducing the transit time on the col¬ 
umn, HPLC can limit diffusional spreading of protein 
bands and thus greatly improve resolution. 

The approach to purification of a protein that has 
not previously been isolated is guided both by estab¬ 
lished precedents and by common sense. In most cases, 
several different methods must be used sequentially to 
purify a protein completely. The choice of method is 


FIGURE 3-17 Column chromatography. The standard elements of a 
chromatographic column include a solid, porous material supported 
inside a column, generally made of plastic or glass. The solid material 
(matrix) makes up the stationary phase through which flows a solu¬ 
tion, the mobile phase. The solution that passes out of the column at 
the bottom (the effluent) is constantly replaced by solution supplied 
from a reservoir at the top. The protein solution to be separated is lay¬ 
ered on top of the column and allowed to percolate into the solid 
matrix. Additional solution is added on top. The protein solution forms 
a band within the mobile phase that is initially the depth of the pro¬ 
tein solution applied to the column. As proteins migrate through the 
column, they are retarded to different degrees by their different inter¬ 
actions with the matrix material. The overall protein band thus widens 
as it moves through the column. Individual types of proteins (such as 
A, B, and C, shown in blue, red, and green) gradually separate from 
each other, forming bands within the broader protein band. Separa¬ 
tion improves (resolution increases) as the length of the column in¬ 
creases. However, each individual protein band also broadens with 
time due to diffusional spreading, a process that decreases resolution. 
In this example, protein A is well separated from B and C, but diffu¬ 
sional spreading prevents complete separation of B and C under these 
conditions. 





































to column containing 
cation exchangers. 


(a) 


1 2 3 4 5 6 

Proteins move through the column at rates 
determined by their net charge at the pH 
being used. With cation exchangers, 
proteins with a more negative net charge 
move faster and elute earlier. 


FIGURE 3-18 Three chromatographic methods used in protein purifi¬ 
cation. (a) Ion-exchange chromatography exploits differences in the 
sign and magnitude of the net electric charges of proteins at a given 
pH. The column matrix is a synthetic polymer containing bound 
charged groups; those with bound anionic groups are called cation 
exchangers, and those with bound cationic groups are called anion 
exchangers. Ion-exchange chromatography on a cation exchanger is 
shown here. The affinity of each protein for the charged groups on the 
column is affected by the pH (which determines the ionization state 
of the molecule) and the concentration of competing free salt ions in 
the surrounding solution. Separation can be optimized by gradually 
changing the pH and/or salt concentration of the mobile phase so as 
to create a pH or salt gradient, (b) Size-exclusion chromatography, 
also called gel filtration, separates proteins according to size. The 
column matrix is a cross-linked polymer with pores of selected size. 
Larger proteins migrate faster than smaller ones, because they are too 
large to enter the pores in the beads and hence take a more direct 
route through the column. The smaller proteins enter the pores and 
are slowed by their more labyrinthine path through the column, 
(c) Affinity chromatography separates proteins by their binding speci¬ 
ficities. The proteins retained on the column are those that bind 
specifically to a ligand cross-linked to the beads. (In biochemistry, the 
term "ligand" is used to refer to a group or molecule that binds to a 
macromolecule such as a protein.) After proteins that do not bind to 
the ligand are washed through the column, the bound protein of 
particular interest is eluted (washed out of the column) by a solution 
containing free ligand. 



(b) 


Protein molecules separate 
by size; larger molecules 
pass more freely, appearing 
in the earlier fractions. 



12 3 4 


5 6 



Mixture 
of proteins 


Protein mixture is 
added to column 
containing a 
polymer-bound 
ligand specific for 
protein of interest. 


■—• 

cr: 

— 






■—• 

— 

— 

— 



D 

► 

□ 




D 

► 

a 



• 

• 

• 

•* 


1 

2 

3 

4 

5 


3 

4 

5 

6 

7 

8 


(c) 


Unwanted proteins 
are washed through 
column. 


Protein of interest 
is eluted by ligand 
solution. 
















































































92 


Chapter 3 Amino Acids, Peptides, and Proteins 


TABLE 3-5 A Purification Table for a Hypothetical Enzyme 


Procedure or step 

Fraction volume 
(ml) 

Total protein 
(mg) 

Activity 

(units) 

Specific activity 
(units/mg) 

1. Crude cellular extract 

1,400 

10,000 

100,000 

10 

2. Precipitation with ammonium sulfate 

280 

3,000 

96,000 

32 

3. Ion-exchange chromatography 

90 

400 

80,000 

200 

4. Size-exclusion chromatography 

80 

100 

60,000 

600 

5. Affinity chromatography 

6 

3 

45,000 

15,000 


Note: All data represent the status of the sample after the designated procedure has been carried out. Activity and specific activity are defined on page 94. 


somewhat empirical, and many protocols may be tried 
before the most effective one is found. Trial and error 
can often be minimized by basing the procedure on pu¬ 
rification techniques developed for similar proteins. 
Published purification protocols are available for many 
thousands of proteins. Common sense dictates that in¬ 
expensive procedures such as salting out be used first, 
when the total volume and the number of contaminants 
are greatest. Chromatographic methods are often im¬ 
practical at early stages, because the amount of chro¬ 
matographic medium needed increases with sample 
size. As each purification step is completed, the sample 
size generally becomes smaller (Table 3-5), making it 
feasible to use more sophisticated (and expensive) 
chromatographic procedures at later stages. 

Proteins Can Be Separated and Characterized 
by Electrophoresis 

Another important technique for the separation of pro¬ 
teins is based on the migration of charged proteins in 
an electric field, a process called electrophoresis. 
These procedures are not generally used to purify pro¬ 
teins in large amounts, because simpler alternatives are 
usually available and electrophoretic methods often 
adversely affect the structure and thus the function of 
proteins. Electrophoresis is, however, especially useful 
as an analytical method. Its advantage is that proteins 
can be visualized as well as separated, permitting a 
researcher to estimate quickly the number of different 
proteins in a mixture or the degree of purity of a par¬ 
ticular protein preparation. Also, electrophoresis allows 
determination of crucial properties of a protein such as 
its isoelectric point and approximate molecular weight. 

Electrophoresis of proteins is generally carried out 
in gels made up of the cross-linked polymer polyacryl¬ 
amide (Fig. 3-19). The polyacrylamide gel acts as a mo¬ 
lecular sieve, slowing the migration of proteins approx¬ 
imately in proportion to their charge-to-mass ratio. 
Migration may also be affected by protein shape. In elec¬ 
trophoresis, the force moving the macromolecule is the 
electrical potential, E. The electrophoretic mobility of 
the molecule, /a, is the ratio of the velocity of the par¬ 


ticle molecule, V, to the electrical potential. Electro¬ 
phoretic mobility is also equal to the net charge of 
the molecule, Z, divided by the frictional coefficient^ 
which reflects in part a protein’s shape. Thus: 

V Z 
^ ~ E ~ f 

The migration of a protein in a gel during electro¬ 
phoresis is therefore a function of its size and its shape. 

An electrophoretic method commonly employed for 
estimation of purity and molecular weight makes use of 
the detergent sodium dodecyl sulfate (SDS). 

O 

Na + ~0—S—O—(CH 2 ) u CH 3 

o 

Sodium dodecyl sulfate 
(SDS) 

SDS binds to most proteins in amounts roughly propor¬ 
tional to the molecular weight of the protein, about one 
molecule of SDS for every two amino acid residues. The 
bound SDS contributes a large net negative charge, ren¬ 
dering the intrinsic charge of the protein insignificant 
and conferring on each protein a similar charge-to-mass 
ratio. In addition, the native conformation of a protein 
is altered when SDS is bound, and most proteins assume 
a similar shape. Electrophoresis in the presence of SDS 
therefore separates proteins almost exclusively on the 
basis of mass (molecular weight), with smaller polypep¬ 
tides migrating more rapidly. After electrophoresis, the 
proteins are visualized by adding a dye such as 
Coomassie blue, which binds to proteins but not to the 
gel itself (Fig. 3-19b). Thus, a researcher can monitor 
the progress of a protein purification procedure as the 
number of protein bands visible on the gel decreases af¬ 
ter each new fractionation step. When compared with 
the positions to which proteins of known molecular 
weight migrate in the gel, the position of an unidenti¬ 
fied protein can provide an excellent measure of its mo¬ 
lecular weight (Fig. 3-20). If the protein has two or more 
different subunits, the subunits will generally be sepa¬ 
rated by the SDS treatment and a separate band will ap¬ 
pear for each. ^ SDS Gel Electrophoresis 
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FIGURE 3-19 Electrophoresis, (a) Different samples are loaded in 
wells or depressions at the top of the polyacrylamide gel. The proteins 
move into the gel when an electric field is applied. The gel minimizes 
convection currents caused by small temperature gradients, as well as 
protein movements other than those induced by the electric field, 
(b) Proteins can be visualized after electrophoresis by treating the gel 
with a stain such as Coomassie blue, which binds to the proteins but 
not to the gel itself. Each band on the gel represents a different pro¬ 


tein (or protein subunit); smaller proteins move through the gel more 
rapidly than larger proteins and therefore are found nearer the bottom 
of the gel. This gel illustrates the purification of the enzyme RNA poly¬ 
merase from E. coli. The first lane shows the proteins present in the 
crude cellular extract. Successive lanes (left to right) show the proteins 
present after each purification step. The purified protein contains four 
subunits, as seen in the last lane on the right. 


Isoelectric focusing is a procedure used to de¬ 
termine the isoelectric point (pi) of a protein (Fig. 
3-21). A pH gradient is established by allowing a mix¬ 
ture of low molecular weight organic acids and bases 
(ampholytes; p. 81) to distribute themselves in an elec¬ 
tric field generated across the gel. When a protein mix¬ 


ture is applied, each protein migrates until it reaches 
the pH that matches its pi (Table 3-6). Proteins with 
different isoelectric points are thus distributed differ¬ 
ently throughout the gel. 

Combining isoelectric focusing and SDS electropho¬ 
resis sequentially in a process called two-dimensional 
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Myosin 
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/3-Galactosidase 

116,250 


Glycogen phosphorylase b 

97,400 
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Bovine serum albumin 

66,200 
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Ovalbumin 

45,000 
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Carbonic anhydrase 
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FIGURE 3-20 Estimating the molecular weight of a protein. The 

electrophoretic mobility of a protein on an SDS polyacrylamide gel 
is related to its molecular weight, M r . (a) Standard proteins of 
known molecular weight are subjected to electrophoresis (lane 1). 
These marker proteins can be used to estimate the molecular 
weight of an unknown protein (lane 2). (b) A plot of log M r of the 
marker proteins versus relative migration during electrophoresis is 
linear, which allows the molecular weight of the unknown protein 
to be read from the graph. 
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A stable pH gradient 
is established in the 
gel after application 
of an electric field. 


Protein solution is 
added and electric 
field is reapplied. 


After staining, proteins 
are shown to be 
distributed along pH 
gradient according to 
their pi values. 


FIGURE 3-21 Isoelectric focusing. This 

technique separates proteins according to 
their isoelectric points. A stable pH gradient 
is established in the gel by the addition of 
appropriate ampholytes. A protein mixture 
is placed in a well on the gel. With an 
applied electric field, proteins enter the gel 
and migrate until each reaches a pH 
equivalent to its pi. Remember that when 
pH = pi, the net charge of a protein is zero. 


electrophoresis permits the resolution of complex 
mixtures of proteins (Fig. 3-22). This is a more sensi¬ 
tive analytical method than either electrophoretic 
method alone. Two-dimensional electrophoresis sepa¬ 
rates proteins of identical molecular weight that differ 
in pi, or proteins with similar pi values but different mo¬ 
lecular weights. 

Unseparated Proteins Can Be Quantified 

To purify a protein, it is essential to have a way of de¬ 
tecting and quantifying that protein in the presence of 
many other proteins at each stage of the procedure. 
Often, purification must proceed in the absence of any 
information about the size and physical properties of the 
protein or about the fraction of the total protein mass 
it represents in the extract. For proteins that are en¬ 
zymes, the amount in a given solution or tissue extract 
can be measured, or assayed, in terms of the catalytic 
effect the enzyme produces—that is, the increase in 
the rate at which its substrate is converted to reaction 
products when the enzyme is present. For this purpose 
one must know (1) the overall equation of the reaction 
catalyzed, (2) an analytical procedure for determining 
the disappearance of the substrate or the appearance of 
a reaction product, (3) whether the enzyme requires co¬ 
factors such as metal ions or coenzymes, (4) the de¬ 
pendence of the enzyme activity on substrate concen¬ 
tration, (5) the optimum pH, and (6) a temperature 
zone in which the enzyme is stable and has high activ¬ 
ity. Enzymes are usually assayed at their optimum pH 
and at some convenient temperature within the range 


25 to 38 °C. Also, very high substrate concentrations are 
generally used so that the initial reaction rate, measured 
experimentally, is proportional to enzyme concentration 
(Chapter 6). 

By international agreement, 1.0 unit of enzyme ac¬ 
tivity is defined as the amount of enzyme causing trans¬ 
formation of 1.0 p,mol of substrate per minute at 25 °C 
under optimal conditions of measurement. The term 
activity refers to the total units of enzyme in a solu¬ 
tion. The specific activity is the number of enzyme 
units per milligram of total protein (Fig. 3-23). The spe¬ 
cific activity is a measure of enzyme purity: it increases 
during purification of an enzyme and becomes maximal 
and constant when the enzyme is pure (Table 3-5). 

TABLE 3-6 The Isoelectric Points 
of Some Proteins 


Protein pi 


Pepsin 

<1.0 

Egg albumin 

4.6 

Serum albumin 

4.9 

Urease 

5.0 

/3-Lactoglobulin 

5.2 

Hemoglobin 

6.8 

Myoglobin 

7.0 

Chymotrypsinogen 

9.5 

Cytochrome c 

10.7 

Lysozyme 

11.0 
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FIGURE 3-22 Two-dimensional electrophoresis, (a) Proteins are first 
separated by isoelectric focusing in a cylindrical gel. The gel is then 
laid horizontally on a second, slab-shaped gel, and the proteins are 
separated by SDS polyacrylamide gel electrophoresis. Horizontal sep¬ 
aration reflects differences in pi; vertical separation reflects differences 
in molecular weight, (b) More than 1,000 different proteins from E. 
coli can be resolved using this technique. 


when further purification steps fail to increase specific 
activity and when only a single protein species can be 
detected (for example, by electrophoresis). 

For proteins that are not enzymes, other quantifi¬ 
cation methods are required. Transport proteins can be 
assayed by their binding to the molecule they transport, 
and hormones and toxins by the biological effect they 
produce; for example, growth hormones will stimulate 
the growth of certain cultured cells. Some structural 
proteins represent such a large fraction of a tissue mass 
that they can be readily extracted and purified without 
a functional assay. The approaches are as varied as the 
proteins themselves. 


After each purification step, the activity of the 
preparation (in units of enzyme activity) is assayed, the 
total amount of protein is determined independently, 
and the ratio of the two gives the specific activity. Ac¬ 
tivity and total protein generally decrease with each 
step. Activity decreases because some loss always oc¬ 
curs due to inactivation or nonideal interactions with 
chromatographic materials or other molecules in the so¬ 
lution. Total protein decreases because the objective is 
to remove as much unwanted or nonspecific protein as 
possible. In a successful step, the loss of nonspecific pro¬ 
tein is much greater than the loss of activity; therefore, 
specific activity increases even as total activity falls. The 
data are then assembled in a purification table similar 
to Table 3-5. A protein is generally considered pure 



FIGURE 3-23 Activity versus specific activity. The difference between 
these two terms can be illustrated by considering two beakers of mar¬ 
bles. The beakers contain the same number of red marbles, but dif¬ 
ferent numbers of marbles of other colors. If the marbles represent 
proteins, both beakers contain the same activity of the protein repre¬ 
sented by the red marbles. The second beaker, however, has the higher 
specific activity because here the red marbles represent a much higher 
fraction of the total. 
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SUMMARY 3.3 Working with Proteins 


■ Proteins are separated and purified by taking 
advantage of differences in their properties. 
Proteins can be selectively precipitated by 
the addition of certain salts. A wide range of 
chromatographic procedures makes use of 
differences in size, binding affinities, charge, 
and other properties. These include ion- 
exchange, size-exclusion, affinity, and high- 
performance liquid chromatography. 

■ Electrophoresis separates proteins on the basis 
of mass or charge. SDS gel electrophoresis and 
isoelectric focusing can be used separately or 
in combination for higher resolution. 

■ All purification procedures require a method for 
quantifying or assaying the protein of interest 
in the presence of other proteins. Purification 
can be monitored by assaying specific activity. 


3.4 The Covalent Structure of Proteins 

Purification of a protein is usually only a prelude to a 
detailed biochemical dissection of its structure and 
function. What is it that makes one protein an enzyme, 
another a hormone, another a structural protein, and 
still another an antibody? How do they differ chemically? 
The most obvious distinctions are structural, and these 
distinctions can be approached at every level of struc¬ 
ture defined in Figure 3-16. 

The differences in primary structure can be espe¬ 
cially informative. Each protein has a distinctive num¬ 
ber and sequence of amino acid residues. As we shall 
see in Chapter 4, the primary structure of a protein de¬ 
termines how it folds up into a unique three-dimensional 
structure, and this in turn determines the function of 
the protein. Primary structure is the focus of the re¬ 
mainder of this chapter. We first consider empirical 
clues that amino acid sequence and protein function are 
closely linked, then describe how amino acid sequence 
is determined; finally, we outline the many uses to which 
this information can be put. 

The Function of a Protein Depends on 
Its Amino Acid Sequence 

The bacterium Escherichia coli produces more than 
3,000 different proteins; a human produces 25,000 to 
35,000. In both cases, each type of protein has a unique 
three-dimensional structure and this structure confers 
a unique function. Each type of protein also has a unique 
amino acid sequence. Intuition suggests that the amino 
acid sequence must play a fundamental role in deter¬ 
mining the three-dimensional structure of the protein, 
and ultimately its function, but is this supposition cor¬ 


rect? A quick survey of proteins and how they vary in 
amino acid sequence provides a number of empirical 
clues that help substantiate the important relationship 
between amino acid sequence and biological function. 

First, as we have already noted, proteins with dif¬ 
ferent functions always have different amino acid se¬ 
quences. Second, thousands of human genetic diseases 
have been traced to the production of defective pro¬ 
teins. Perhaps one-third of these proteins are defective 
because of a single change in their amino acid sequence; 
hence, if the primary structure is altered, the function 
of the protein may also be changed. Finally, on com¬ 
paring functionally similar proteins from different 
species, we find that these proteins often have similar 
amino acid sequences. An extreme case is ubiquitin, a 
76-residue protein involved in regulating the degrada¬ 
tion of other proteins. The amino acid sequence of ubiq¬ 
uitin is identical in species as disparate as fruit flies and 
humans. 

Is the amino acid sequence absolutely fixed, or in¬ 
variant, for a particular protein? No; some flexibility is 
possible. An estimated 20% to 30% of the proteins in 
humans are polymorphic, having amino acid sequence 
variants in the human population. Many of these varia¬ 
tions in sequence have little or no effect on the func¬ 
tion of the protein. Furthermore, proteins that carry out 
a broadly similar function in distantly related species can 
differ greatly in overall size and amino acid sequence. 

Although the amino acid sequence in some regions 
of the primary structure might vary considerably with¬ 
out affecting biological function, most proteins contain 
crucial regions that are essential to their function and 
whose sequence is therefore conserved. The fraction of 
the overall sequence that is critical varies from protein 
to protein, complicating the task of relating sequence to 
three-dimensional structure, and structure to function. 
Before we can consider this problem further, however, 
we must examine how sequence information is obtained. 

The Amino Acid Sequences of Millions 
of Proteins Have Been Determined 

Two major discoveries in 1953 were of crucial importance 
in the history of biochemistry. In that year James D. 
Watson and Francis Crick deduced the double-helical 
structure of DNA and proposed a structural basis for its 
precise replication (Chapter 8). Their proposal illumi¬ 
nated the molecular reality behind the idea of a gene. 
In that same year, Frederick Sanger worked out the se¬ 
quence of amino acid residues in the polypeptide chains 
of the hormone insulin (Fig. 3-24), surprising many 
researchers who had long thought that elucidation of 
the amino acid sequence of a polypeptide would be a 
hopelessly difficult task. It quickly became evident that 
the nucleotide sequence in DNA and the amino acid 
sequence in proteins were somehow related. Barely a 
decade after these discoveries, the role of the nucleotide 
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sequence of DNA in determining the amino acid se¬ 
quence of protein molecules was revealed (Chapter 27). 
An enormous number of protein sequences can now be 
derived indirectly from the DNA sequences in the rapidly 
growing genome databases. However, many are still de¬ 
duced by traditional methods of polypeptide sequencing. 

The amino acid sequences of thousands of different 
proteins from many species have been determined us¬ 
ing principles first developed by Sanger. These methods 
are still in use, although with many variations and im¬ 
provements in detail. Chemical protein sequencing now 
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FIGURE 3-24 Amino acid sequence 
of bovine insulin. The two polypeptide 
chains are joined by disulfide cross- 
linkages. The A chain is identical in 
human, pig, dog, rabbit, and sperm 
whale insulins. The B chains of the 
cow, pig, dog, goat, and horse are 
identical. 
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complements a growing list of newer methods, provid¬ 
ing multiple avenues to obtain amino acid sequence 
data. Such data are now critical to every area of bio¬ 
chemical investigation. 

Short Polypeptides Are Sequenced 
Using Automated Procedures 

Various procedures are used to analyze protein primary 
structure. Several protocols are available to label and 
identify the amino-terminal amino acid residue (Fig. 
3-25a). Sanger developed the reagent l-fluoro-2,4- 
dinitrobenzene (FDNB) for this purpose; other reagents 
used to label the amino-terminal residue, dansyl chlo¬ 
ride and dabsyl chloride, yield derivatives that are more 
easily detectable than the dinitrophenyl derivatives. Af¬ 
ter the amino-terminal residue is labeled with one of 
these reagents, the polypeptide is hydrolyzed to its con¬ 
stituent amino acids and the labeled amino acid is iden¬ 
tified. Because the hydrolysis stage destroys the 
polypeptide, this procedure cannot be used to sequence 
a polypeptide beyond its amino-terminal residue. How¬ 
ever, it can help determine the number of chemically 
distinct polypeptides in a protein, provided each has a 
different amino-terminal residue. For example, two 
residues—Phe and Gly—would be labeled if insulin (Fig. 
3-24) were subjected to this procedure. 
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FIGURE 3-25 Steps in sequencing a polypeptide, (a) Identification of 
the amino-terminal residue can be the first step in sequencing a 
polypeptide. Sanger's method for identifying the amino-terminal 
residue is shown here, (b) The Edman degradation procedure reveals 


the entire sequence of a peptide. For shorter peptides, this method 
alone readily yields the entire sequence, and step (a) is often omit¬ 
ted. Step (a) is useful in the case of larger polypeptides, which are of¬ 
ten fragmented into smaller peptides for sequencing (see Fig. 3-27). 


To sequence an entire polypeptide, a chemical 
method devised by Pehr Edman is usually employed. 
The Edman degradation procedure labels and re¬ 
moves only the amino-terminal residue from a peptide, 
leaving all other peptide bonds intact (Fig- 3-25b). The 
peptide is reacted with phenylisothiocyanate under 
mildly alkaline conditions, which converts the amino- 
terminal amino acid to a phenylthiocarbamoyl (PTC) 
adduct. The peptide bond next to the PTC adduct is 
then cleaved in a step carried out in anhydrous trifluo- 
roacetic acid, with removal of the amino-terminal amino 
acid as an anilinothiazolinone derivative. The deriva- 
tized amino acid is extracted with organic solvents, con¬ 
verted to the more stable phenylthiohydantoin deriva¬ 
tive by treatment with aqueous acid, and then identified. 
The use of sequential reactions carried out under first 
basic and then acidic conditions provides control over 


the entire process. Each reaction with the amino- 
terminal amino acid can go essentially to completion 
without affecting any of the other peptide bonds in the 
peptide. After removal and identification of the amino- 
terminal residue, the new amino-terminal residue so 
exposed can be labeled, removed, and identified 
through the same series of reactions. This procedure is 
repeated until the entire sequence is determined. The 
Edman degradation is carried out on a machine, called 
a sequenator, that mixes reagents in the proper pro¬ 
portions, separates the products, identifies them, and 
records the results. These methods are extremely sen¬ 
sitive. Often, the complete amino acid sequence can be 
determined starting with only a few micrograms of 
protein. 

The length of polypeptide that can be accurately 
sequenced by the Edman degradation depends on the 
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efficiency of the individual chemical steps. Consider a 
peptide beginning with the sequence Gly-Pro-Lys- at 
its amino terminus. If glycine were removed with 97% 
efficiency, 3% of the polypeptide molecules in the solu¬ 
tion would retain a Gly residue at their amino terminus. 
In the second Edman cycle, 97% of the liberated amino 
acids would be proline, and 3% glycine, while 3% of the 
polypeptide molecules would retain Gly (0.1%) or Pro 
(2.9%) residues at their amino terminus. At each cycle, 
peptides that did not react in earlier cycles would con¬ 
tribute amino acids to an ever-increasing background, 
eventually making it impossible to determine which 
amino acid is next in the original peptide sequence. 
Modern sequenators achieve efficiencies of better than 
99% per cycle, permitting the sequencing of more than 
50 contiguous amino acid residues in a polypeptide. The 
primary structure of insulin, worked out by Sanger and 
colleagues over a period of 10 years, could now be com¬ 
pletely determined in a day or two. 

Large Proteins Must Be Sequenced 
in Smaller Segments 

The overall accuracy of amino acid sequencing gener¬ 
ally declines as the length of the polypeptide increases. 
The very large polypeptides found in proteins must be 
broken down into smaller pieces to be sequenced effi¬ 
ciently. There are several steps in this process. First, the 
protein is cleaved into a set of specific fragments by 
chemical or enzymatic methods. If any disulfide bonds 


are present, they must be broken. Each fragment is pu¬ 
rified, then sequenced by the Edman procedure. Finally, 
the order in which the fragments appear in the original 
protein is determined and disulfide bonds (if any) are 
located. 

Breaking Disulfide Bonds Disulfide bonds interfere with 
the sequencing procedure. A cystine residue (Fig. 3-7) 
that has one of its peptide bonds cleaved by the Edman 
procedure may remain attached to another polypeptide 
strand via its disulfide bond. Disulfide bonds also inter¬ 
fere with the enzymatic or chemical cleavage of the 
polypeptide. Two approaches to irreversible breakage of 
disulfide bonds are outlined in Figure 3-26. 

Cleaving the Polypeptide Chain Several methods can be 
used for fragmenting the polypeptide chain. Enzymes 
called proteases catalyze the hydrolytic cleavage of 
peptide bonds. Some proteases cleave only the peptide 
bond adjacent to particular amino acid residues (Table 
3-7) and thus fragment a polypeptide chain in a pre¬ 
dictable and reproducible way. A number of chemical 
reagents also cleave the peptide bond adjacent to spe¬ 
cific residues. 

Among proteases, the digestive enzyme trypsin cat¬ 
alyzes the hydrolysis of only those peptide bonds in 
which the carbonyl group is contributed by either a Lys 
or an Arg residue, regardless of the length or amino acid 
sequence of the chain. The number of smaller peptides 
produced by trypsin cleavage can thus be predicted 



FIGURE 3-26 Breaking disulfide bonds in proteins. Two common 
methods are illustrated. Oxidation of a cystine residue with performic 
acid produces two cysteic acid residues. Reduction by dithiothreitol 
to form Cys residues must be followed by further modification of 
the reactive —SH groups to prevent re-formation of the disulfide 
bond. Acetylation by iodoacetate serves this purpose. 
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TABLE 3-7 The Specificity of Some Common 
Methods for Fragmenting Polypeptide Chains 


Reagent (biological source)* 

Cleavage points f 

Trypsin 

(bovine pancreas) 

Lys, Arg (C) 

Submaxillarus protease 
(mouse submaxillary gland) 

Arg (C) 

Chymotrypsin 

(bovine pancreas) 

Phe, Trp, Tyr (C) 

Staphylococcus aureus V8 protease 
(bacterium S. aureus ) 

Asp, Glu (C) 

Asp-/V-protease 

(bacterium Pseudomonas fragi) 

Asp, Glu (N) 

Pepsin 

(porcine stomach) 

Phe, Trp, Tyr (N) 

Endoproteinase Lys C 
(bacterium Lysobacter 
enzymogene s) 

Lys (C) 

Cyanogen bromide 

Met (C) 


*AII reagents except cyanogen bromide are proteases. All are available from commercial sources. 

‘'’Residues furnishing the primary recognition point for the protease or reagent; peptide bond 
cleavage occurs on either the carbonyl (C) or the amino (N) side of the indicated amino acid 
residues. 


cause of overlaps, between the fragments obtained by 
the first cleavage procedure (Fig. 3-27). Overlapping 
peptides obtained from the second fragmentation yield 
the correct order of the peptide fragments produced in 
the first. If the amino-terminal amino acid has been iden¬ 
tified before the original cleavage of the protein, this in¬ 
formation can be used to establish which fragment is 
derived from the amino terminus. The two sets of frag¬ 
ments can be compared for possible errors in deter¬ 
mining the amino acid sequence of each fragment. If 
the second cleavage procedure fails to establish conti¬ 
nuity between all peptides from the first cleavage, a 
third or even a fourth cleavage method must be used to 
obtain a set of peptides that can provide the necessary 
overlap (s). 

Locating Disulfide Bonds If the primary structure in¬ 
cludes disulfide bonds, their locations are determined 
in an additional step after sequencing is completed. A 
sample of the protein is again cleaved with a reagent 
such as trypsin, this time without first breaking the 
disulfide bonds. The resulting peptides are separated by 
electrophoresis and compared with the original set of 
peptides generated by trypsin. For each disulfide bond, 
two of the original peptides will be missing and a new, 
larger peptide will appear. The two missing peptides 
represent the regions of the intact polypeptide that are 
linked by the disulfide bond. 


from the total number of Lys or Arg residues in the orig¬ 
inal polypeptide, as determined by hydrolysis of an in¬ 
tact sample (Fig. 3-27). A polypeptide with five Lys 
and/or Arg residues will usually yield six smaller pep¬ 
tides on cleavage with trypsin. Moreover, all except one 
of these will have a carboxyl-terminal Lys or Arg. The 
fragments produced by trypsin (or other enzyme or 
chemical) action are then separated by chromato¬ 
graphic or electrophoretic methods. 

Sequencing of Peptides Each peptide fragment resulting 
from the action of trypsin is sequenced separately by 
the Edman procedure. 

Ordering Peptide Fragments The order of the “trypsin 
fragments” in the original polypeptide chain must now 
be determined. Another sample of the intact polypep¬ 
tide is cleaved into fragments using a different enzyme 
or reagent, one that cleaves peptide bonds at points 
other than those cleaved by trypsin. For example, 
cyanogen bromide cleaves only those peptide bonds in 
which the carbonyl group is contributed by Met. The 
fragments resulting from this second procedure are then 
separated and sequenced as before. 

The ammo acid sequences of each fragment ob¬ 
tained by the two cleavage procedures are examined, 
with the objective of finding peptides from the second 
procedure whose sequences establish continuity, be- 


Amino Acid Sequences Can Also Be Deduced 
by Other Methods 

The approach outlined above is not the only way to de¬ 
termine amino acid sequences. New methods based on 
mass spectrometry permit the sequencing of short 
polypeptides (20 to 30 amino acid residues) in just a 
few minutes (Box 3-2). In addition, with the develop¬ 
ment of rapid DNA sequencing methods (Chapter 8), 
the elucidation of the genetic code (Chapter 27), and 
the development of techniques for isolating genes 
(Chapter 9), researchers can deduce the sequence of a 
polypeptide by determining the sequence of nucleotides 
in the gene that codes for it (Fig. 3-28). The techniques 
used to determine protein and DNA sequences are com¬ 
plementary. When the gene is available, sequencing the 
DNA can be faster and more accurate than sequencing 
the protein. Most proteins are now sequenced in this in¬ 
direct way. If the gene has not been isolated, direct se¬ 
quencing of peptides is necessary, and this can provide 
information (the location of disulfide bonds, for exam¬ 
ple) not available in a DNA sequence. In addition, a 
knowledge of the amino acid sequence of even a part of 
a polypeptide can greatly facilitate the isolation of the 
corresponding gene (Chapter 9). 

The array of methods now available to analyze both 
proteins and nucleic acids is ushering in a new disci- 
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FIGURE 3-27 Cleaving proteins and sequencing and ordering the 
peptide fragments. First, the amino acid composition and amino- 
terminal residue of an intact sample are determined. Then any disulfide 
bonds are broken before fragmenting so that sequencing can proceed 
efficiently. In this example, there are only two Cys (C) residues and 


thus only one possibility for location of the disulfide bond. In polypep¬ 
tides with three or more Cys residues, the position of disulfide bonds 
can be determined as described in the text. (The one-letter symbols 
for amino acids are given in Table 3-1.) 


pline of “whole cell biochemistry.” The complete se¬ 
quence of an organism’s DNA, its genome, is now avail¬ 
able for organisms ranging from viruses to bacteria to 
multicellular eukaryotes (see Table 1-4). Genes are be¬ 
ing discovered by the millions, including many that en¬ 
code proteins with no known function. To describe the 
entire protein complement encoded by an organism’s 
DNA, researchers have coined the term proteome. As 
described in Chapter 9, the new disciplines of genomics 
and proteomics are complementing work carried out 
on cellular intermediary metabolism and nucleic acid 


metabolism to provide a new and increasingly complete 
picture of biochemistry at the level of cells and even 
organisms. 

Amino acid 

sequence (protein) Gln-Tyr-Pro-Thr-Ile-Trp 

I-II-II-II-II-II-1 

DNA sequence (gene) CAGTATCCTACGATTTGG 

FIGURE 3-28 Correspondence of DNA and amino acid sequences. 

Each amino acid is encoded by a specific sequence of three nucleo¬ 
tides in DNA. The genetic code is described in detail in Chapter 27. 
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BOX 3-2 WORKING IN BIOCHEMISTRY 


Investigating Proteins with Mass Spectrometry 

The mass spectrometer has long been an indispensa¬ 
ble tool in chemistry. Molecules to be analyzed, re¬ 
ferred to as analytes, are first ionized in a vacuum. 
When the newly charged molecules are introduced 
into an electric and/or magnetic field, their paths 
through the field are a function of their mass-to-charge 
ratio, mJz. This measured property of the ionized 
species can be used to deduce the mass (M) of the 
analyte with very high precision. 

Although mass spectrometry has been in use for 
many years, it could not be applied to macromolecules 
such as proteins and nucleic acids. The mtz meas¬ 
urements are made on molecules in the gas phase, and 
the heating or other treatment needed to transfer a 
macromolecule to the gas phase usually caused its 
rapid decomposition. In 1988, two different tech¬ 
niques were developed to overcome this problem. In 
one, proteins are placed in a light-absorbing matrix. 
With a short pulse of laser light, the proteins are ion¬ 
ized and then desorbed from the matrix into the vac¬ 
uum system. This process, known as matrix-assisted 
laser desorption/ionization mass spectrometry, 
or MALDI MS, has been successfully used to meas¬ 
ure the mass of a wide range of macromolecules. In a 
second and equally successful method, macromole¬ 
cules in solution are forced directly from the liquid to 
gas phase. A solution of analytes is passed through a 
charged needle that is kept at a high electrical po¬ 
tential, dispersing the solution into a fine mist of 
charged microdroplets. The solvent surrounding the 
macromolecules rapidly evaporates, and the resulting 
multiply charged macromolecular ions are thus intro¬ 
duced nondestructively into the gas phase. This tech¬ 
nique is called electrospray ionization mass spec¬ 
trometry, or ESI MS. Protons added during passage 
through the needle give additional charge to the 
macromolecule. The mJz of the molecule can be ana¬ 
lyzed in the vacuum chamber. 

Mass spectrometry provides a wealth of informa¬ 
tion for proteomics research, enzymology, and protein 
chemistry in general. The techniques require only 
miniscule amounts of sample, so they can be readily 
applied to the small amounts of protein that can be 
extracted from a two-dimensional electrophoretic gel. 
The accurately measured molecular mass of a protein 
is one of the critical parameters in its identification. 
Once the mass of a protein is accurately known, mass 
spectrometry is a convenient and accurate method for 
detecting changes in mass due to the presence of 
bound cofactors, bound metal ions, covalent modifi¬ 
cations, and so on. 


The process for determining the molecular mass 
of a protein with ESI MS is illustrated in Figure 1. As 
it is injected into the gas phase, a protein acquires a 
variable number of protons, and thus positive charges, 
from the solvent. This creates a spectrum of species 
with different mass-to-charge ratios. Each successive 
peak corresponds to a species that differs from that 
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FIGURE 1 Electrospray mass spectrometry of a protein, (a) A pro¬ 
tein solution is dispersed into highly charged droplets by passage 
through a needle under the influence of a high-voltage electric field. 
The droplets evaporate, and the ions (with added protons in this 
case) enter the mass spectrometer for m/z measurement. The spec¬ 
trum generated (b) is a family of peaks, with each successive peak 
(from right to left) corresponding to a charged species increased by 
1 in both mass and charge. A computer-generated transformation of 
this spectrum is shown in the inset. 


















































3.4 The Covalent Structure of Proteins 


103 



of its neighboring peak by a charge difference of 1 and 
a mass difference of 1 (1 proton). The mass of the 
protein can be determined from any two neighboring 
peaks. The measured mlz of one peak is 

M + n 2x 
( m/z ) 2 = “ 

n 2 

where M is the mass of the protein, n 2 is the number 
of charges, and X is the mass of the added groups 
(protons in this case). Similarly for the neighboring 
peak, 

M + (jl 2 + 1)A 
(w/z) i = n^TT 

We now have two unknowns (M and n 2 ~) and two equa¬ 
tions. We can solve first for n 2 and then for M: 

( m/z ) 2 - X 
n2 ~ ( m/z ) 2 — (m/z )i 

M = n 2 [( m/z ) 2 - X\ 

This calculation using the mlz values for any two 
peaks in a spectrum such as that shown in Figure lb 
usually provides the mass of the protein (in this case, 
aerolysin k; 47,342 Da) with an error of only ±0.01%. 
Generating several sets of peaks, repeating the calcu¬ 
lation, and averaging the results generally provides an 
even more accurate value for M. Computer algorithms 
can transform the mlz spectrum into a single peak that 


FIGURE 2 Obtaining protein sequence information with tandem 

MS. (a) After proteolytic hydrolysis, a protein solution is injected 
into a mass spectrometer (MS-1). The different peptides are sorted 
so that only one type is selected for further analysis. The selected 
peptide is further fragmented in a chamber between the two mass 
spectrometers, and m/z for each fragment is measured in the sec¬ 
ond mass spectrometer (MS-2). Many of the ions generated during 
this second fragmentation result from breakage of the peptide bond, 
as shown. These are called b-type or y-type ions, depending on 
whether the charge is retained on the amino- or carboxyl-terminal 
side, respectively, (b) A typical spectrum with peaks representing 
the peptide fragments generated from a sample of one small pep¬ 
tide (10 residues). The labeled peaks are y-type ions. The large peak 
next to y 5 " is a doubly charged ion and is not part of the y set. The 
successive peaks differ by the mass of a particular amino acid in 
the original peptide. In this case, the deduced sequence was 
Phe-Pro-Gly-Gln-(He/Leu)-Asn-Ala-Asp-(lle/Leu)-Arg. Note the 
ambiguity about lie and Leu residues, because they have the same 
molecular mass. In this example, the set of peaks derived from y-type 
ions predominates, and the spectrum is greatly simplified as a re¬ 
sult. This is because an Arg residue occurs at the carboxyl terminus 
of the peptide, and most of the positive charges are retained on this 
residue. 


also provides a very accurate mass measurement (Fig. 
lb, inset). 

Mass spectrometry can also be used to sequence 
short stretches of polypeptide, an application that has 
emerged as an invaluable tool for quickly identifying 
unknown proteins. Sequence information is extracted 
using a technique called tandem MS, or MS/MS. A 
solution containing the protein under investigation is 
first treated with a protease or chemical reagent to 
hydrolyze it to a mixture of shorter peptides. The mix¬ 
ture is then injected into a device that is essentially 
two mass spectrometers in tandem (Fig. 2a, top). In 
the first, the peptide mixture is sorted and the ion¬ 
ized fragments are manipulated so that only one of the 
several types of peptides produced by cleavage 
emerges at the other end. The sample of the selected 

(continued on next page) 
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BOX 3-2 WORKING IN BIOCHEMISTRY (continued from previous page) 


peptide, each molecule of which has a charge some¬ 
where along its length, then travels through a vacuum 
chamber between the two mass spectrometers. In this 
collision cell, the peptide is further fragmented by 
high-energy impact with a “collision gas,” a small 
amount of a noble gas such as helium or argon that is 
bled into the vacuum chamber. This procedure is de¬ 
signed to fragment many of the peptide molecules in 
the sample, with each individual peptide broken in 
only one place, on average. Most breaks occur at pep¬ 
tide bonds. This fragmentation does not involve the 
addition of water (it is done in a near-vacuum), so the 
products may include molecular ion radicals such as 
carbonyl radicals (Fig. 2a, bottom). The charge on the 
original peptide is retained on one of the fragments 
generated from it. 

The second mass spectrometer then measures the 
m/z ratios of all the charged fragments (uncharged 
fragments are not detected). This generates one or 
more sets of peaks. A given set of peaks (Fig. 2b) con¬ 
sists of all the charged fragments that were generated 
by breaking the same type of bond (but at different 
points in the peptide) and are derived from the same 
side of the bond breakage, either the carboxyl- or 
amino-terminal side. Each successive peak in a given 
set has one less amino acid than the peak before. The 
difference in mass from peak to peak identifies the 
amino acid that was lost in each case, thus revealing 
the sequence of the peptide. The only ambiguities in¬ 
volve leucine and isoleucine, which have the same mass. 

The charge on the peptide can be retained on ei¬ 
ther the carboxyl- or amino-terminal fragment, and 


bonds other than the peptide bond can be broken in 
the fragmentation process, with the result that multi¬ 
ple sets of peaks are usually generated. The two most 
prominent sets generally consist of charged fragments 
derived from breakage of the peptide bonds. The set 
consisting of the carboxyl-terminal fragments can be 
unambiguously distinguished from that consisting of 
the amino-terminal fragments. Because the bond 
breaks generated between the spectrometers (in the 
collision cell) do not yield full carboxyl and amino 
groups at the sites of the breaks, the only intact a- 
amino and a-carboxyl groups on the peptide frag¬ 
ments are those at the very ends (Fig. 2a). The two 
sets of fragments can thereby be identified by the re¬ 
sulting slight differences in mass. The amino acid se¬ 
quence derived from one set can be confirmed by the 
other, improving the confidence in the sequence in¬ 
formation obtained. 

Even a short sequence is often enough to permit 
unambiguous association of a protein with its gene, if 
the gene sequence is known. Sequencing by mass 
spectrometry cannot replace the Edman degradation 
procedure for the sequencing of long polypeptides, 
but it is ideal for proteomics research aimed at cata¬ 
loging the hundreds of cellular proteins that might be 
separated on a two-dimensional gel. In the coming 
decades, detailed genomic sequence data will be avail¬ 
able from hundreds, eventually thousands, of organ¬ 
isms. The ability to rapidly associate proteins with 
genes using mass spectrometry will greatly facilitate 
the exploitation of this extraordinary information 
resource. 


Small Peptides and Proteins Can Be 
Chemically Synthesized 

Many peptides are potentially useful as pharmacologic 
agents, and their production is of considerable com¬ 
mercial importance. There are three ways to obtain a 
peptide: (1) purification from tissue, a task often made 
difficult by the vanishingly low concentrations of some 
peptides; (2) genetic engineering (Chapter 9); or (3) di¬ 
rect chemical synthesis. Powerful techniques now make 
direct chemical synthesis an attractive option in many 
cases. In addition to commercial applications, the syn¬ 
thesis of specific peptide portions of larger proteins is 
an increasingly important tool for the study of protein 
structure and function. 

The complexity of proteins makes the traditional 
synthetic approaches of organic chemistry impractical 
for peptides with more than four or five amino acid 


residues. One problem is the difficulty of purifying the 
product after each step. 

The major breakthrough in this technology was 
provided by R. Bruce Merrifield in 1962. His innovation 
involved synthesizing a peptide while keeping it at¬ 
tached at one end to a solid support. The support is an 
insoluble polymer (resin) contained within a column, 
similar to that used for chromatographic procedures. 
The peptide is built up on this support one amino acid 
at a time using a standard set of reactions in a repeat¬ 
ing cycle (Fig. 3-29). At each successive step in the 
cycle, protective chemical groups block unwanted 
reactions. 

The technology for chemical peptide synthesis is 
now automated. As in the sequencing reactions already 
considered, the most important limitation of the process 
is the efficiency of each chemical cycle, as can be seen 
by calculating the overall yields of peptides of various 
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lengths when the yield for addition of each new amino 
acid is 96.0% versus 99.8% (Table 3-8). Incomplete re¬ 
action at one stage can lead to formation of an impurity 
(in the form of a shorter peptide) in the next. The 
chemistry has been optimized to permit the synthesis 


of proteins of 100 amino acid residues in a few days in 
reasonable yield. A very similar approach is used to 
synthesize nucleic acids (see Fig. 8-38). It is worth not¬ 
ing that this technology, impressive as it is, still pales 
when compared with biological processes. The same 



Amino acid 1 with 
a-amino group protected 
by Fmoc group 


Fmoc 



FIGURE 3-29 Chemical synthesis of a peptide on an insoluble polymer support. 

Reactions (T) through (?) are necessary for the formation of each peptide bond. 
The 9-fluorenylmethoxycarbonyl (Fmoc) group (shaded blue) prevents unwanted 
reactions at the a-amino group of the residue (shaded red). Chemical synthesis 
proceeds from the carboxyl terminus to the amino terminus, the reverse of the 
direction of protein synthesis in vivo (Chapter 27). 
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TABLE 3-8 Effect of Stepwise Yield on Overall 
Yield in Peptide Synthesis 


Number of residues in 
the final polypeptide 

Overall yield of final peptide (%) 
when the yield of each step is: 

96.0% 

99.8% 

11 

66 

98 

21 

44 

96 

31 

29 

94 

51 

13 

90 

100 

1.7 

82 


100-amino-acid protein would be synthesized with ex¬ 
quisite fidelity in about 5 seconds in a bacterial cell. 

A variety of new methods for the efficient ligation 
(joining together) of peptides has made possible the as¬ 
sembly of synthetic peptides into larger proteins. With 
these methods, novel forms of proteins can be created 
with precisely positioned chemical groups, including 
those that might not normally be found in a cellular pro¬ 
tein. These novel forms provide new ways to test theo¬ 
ries of enzyme catalysis, to create proteins with new 
chemical properties, and to design protein sequences 
that will fold into particular structures. This last appli¬ 
cation provides the ultimate test of our increasing abil¬ 
ity to relate the primary structure of a peptide to the 
three-dimensional structure that it takes up in solution. 

Amino Acid Sequences Provide Important 
Biochemical Information 

Knowledge of the sequence of amino acids in a protein 
can offer insights into its three-dimensional structure 
and its function, cellular location, and evolution. Most 
of these insights are derived by searching for similari¬ 
ties with other known sequences. Thousands of se¬ 
quences are known and available in databases accessi¬ 
ble through the Internet. A comparison of a newly 
obtained sequence with this large bank of stored se¬ 
quences often reveals relationships both surprising and 
enlightening. 

Exactly how the amino acid sequence determines 
three-dimensional structure is not understood in detail, 
nor can we always predict function from sequence. 
However, protein families that have some shared struc¬ 
tural or functional features can be readily identified on 
the basis of amino acid sequence similarities. Individual 
proteins are assigned to families based on the degree of 
similarity in amino acid sequence. Members of a family 
are usually identical across 25% or more of their se¬ 
quences, and proteins in these families generally share 
at least some structural and functional characteristics. 
Some families are defined, however, by identities in¬ 
volving only a few amino acid residues that are critical 


to a certain function. A number of similar substructures 
(to be defined in Chapter 4 as “domains”) occur in many 
functionally unrelated proteins. These domains often 
fold into structural configurations that have an unusual 
degree of stability or that are specialized for a certain 
environment. Evolutionary relationships can also be in¬ 
ferred from the structural and functional similarities 
within protein families. 

Certain amino acid sequences serve as signals that 
determine the cellular location, chemical modification, 
and half-life of a protein. Special signal sequences, usu¬ 
ally at the amino terminus, are used to target certain 
proteins for export from the cell; other proteins are tar¬ 
geted for distribution to the nucleus, the cell surface, 
the cytosol, and other cellular locations. Other se¬ 
quences act as attachment sites for prosthetic groups, 
such as sugar groups in glycoproteins and lipids in 
lipoproteins. Some of these signals are well character¬ 
ized and are easily recognized in the sequence of a newly 
characterized protein (Chapter 27). 

SUMMARY 3.4 The Covalent Structure of Proteins 


3.5 Protein Sequences and Evolution 

The simple string of letters denoting the amino acid se¬ 
quence of a given protein belies the wealth of informa¬ 
tion this sequence holds. As more protein sequences 
have become available, the development of more pow¬ 
erful methods for extracting information from them has 
become a major biochemical enterprise. Each protein’s 
function relies on its three-dimensional structure, which 


■ Differences in protein function result from 
differences in amino acid composition and 
sequence. Some variations in sequence are 
possible for a particular protein, with little or 
no effect on function. 

■ Amino acid sequences are deduced by 
fragmenting polypeptides into smaller peptides 
using reagents known to cleave specific peptide 
bonds; determining the amino acid sequence 

of each fragment by the automated Edman 
degradation procedure; then ordering the 
peptide fragments by finding sequence overlaps 
between fragments generated by different 
reagents. A protein sequence can also be 
deduced from the nucleotide sequence of its 
corresponding gene in DNA. 

■ Short proteins and peptides (up to about 100 
residues) can be chemically synthesized. The 
peptide is built up, one amino acid residue at 
a time, while remaining tethered to a solid 
support. 
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in turn is determined largely by its primary structure. 
Thus, the biochemical information conveyed by a pro¬ 
tein sequence is in principle limited only by our own un¬ 
derstanding of structural and functional principles. On 
a different level of inquiry, protein sequences are be¬ 
ginning to tell us how the proteins evolved and, ulti¬ 
mately, how life evolved on this planet. 

Protein Sequences Can Elucidate 
the History of Life on Earth 

The field of molecular evolution is often traced to Emile 
Zuckerkandl and Linus Pauling, whose work in the mid- 
1960s advanced the use of nucleotide and protein se¬ 
quences to explore evolution. The premise is deceptively 
straightforward. If two organisms are closely related, the 
sequences of their genes and proteins should be simi¬ 
lar. The sequences increasingly diverge as the evolu¬ 
tionary distance between two organisms increases. The 
promise of this approach began to be realized in the 
1970s, when Carl Woese used ribosomal RNA sequences 
to define archaebacteria as a group of living organisms 
distinct from other bacteria and eukaryotes (see Fig. 
1-4). Protein sequences offer an opportunity to greatly 
refine the available information. With the advent of 
genome projects investigating organisms from bacteria 
to humans, the number of available sequences is grow¬ 
ing at an enormous rate. This information can be used 
to trace biological history. The challenge is in learning 
to read the genetic hieroglyphics. 

Evolution has not taken a simple linear path. Com¬ 
plexities abound in any attempt to mine the evolution¬ 
ary information stored in protein sequences. For a given 
protein, the amino acid residues essential for the activ¬ 
ity of the protein are conserved over evolutionary time. 
The residues that are less important to function may 
vary over time—that is, one amino acid may substitute 
for another—and these variable residues can provide 
the information used to trace evolution. Amino acid sub¬ 
stitutions are not always random, however. At some po¬ 
sitions in the primary structure, the need to maintain 
protein function may mean that only particular amino 
acid substitutions can be tolerated. Some proteins have 
more variable amino acid residues than others. For these 
and other reasons, proteins can evolve at different rates. 

Another complicating factor in tracing evolutionary 
history is the rare transfer of a gene or group of genes 
from one organism to another, a process called lateral 
gene transfer. The transferred genes may be quite sim¬ 


ilar to the genes they were derived from in the original 
organism, whereas most other genes in the same two 
organisms may be quite distantly related. An example 
of lateral gene transfer is the recent rapid spread of 
antibiotic-resistance genes in bacterial populations. The 
proteins derived from these transferred genes would not 
be good candidates for the study of bacterial evolution, 
because they share only a very limited evolutionary his¬ 
tory with their “host” organisms. 

The study of molecular evolution generally focuses 
on families of closely related proteins. In most cases, the 
families chosen for analysis have essential functions in 
cellular metabolism that must have been present in the 
earliest viable cells, thus greatly reducing the chance 
that they were introduced relatively recently by lateral 
gene transfer. For example, a protein called EF-la 
(elongation factor la) is involved in the synthesis of pro¬ 
teins in all eukaryotes. A similar protein, EF-Tu, with 
the same function, is found in bacteria. Similarities in 
sequence and function indicate that EF-la and EF-Tu 
are members of a family of proteins that share a com¬ 
mon ancestor. The members of protein families are 
called homologous proteins, or homologs. The con¬ 
cept of a homolog can be further refined. If two proteins 
within a family (that is, two homologs) are present in 
the same species, they are referred to as paralogs. Ho¬ 
mologs from different species are called orthologs 
(see Fig. 1-37). The process of tracing evolution involves 
first identifying suitable families of homologous proteins 
and then using them to reconstruct evolutionary paths. 

Homologs are identified using increasingly power¬ 
ful computer programs that can directly compare two 
or more chosen protein sequences, or can search vast 
databases to find the evolutionary relatives of one se¬ 
lected protein sequence. The electronic search process 
can be thought of as sliding one sequence past the other 
until a section with a good match is found. Within this 
sequence alignment, a positive score is assigned for each 
position where the amino acid residues in the two se¬ 
quences are identical—the value of the score varying 
from one program to the next—to provide a measure of 
the quality of the alignment. The process has some com¬ 
plications. Sometimes the proteins being compared 
match well at, say, two sequence segments, and these 
segments are connected by less related sequences of 
different lengths. Thus the two matching segments can¬ 
not be aligned at the same time. To handle this, the com¬ 
puter program introduces “gaps” in one of the sequences 
to bring the matching segments into register (Fig. 3-30). 


E. coli TGNRTIAV 
B. subtilis D EDQ'T ILL 
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FIGURE 3-30 Aligning protein sequences with the use of gaps. 

Shown here is the sequence alignment of a short section of the EF-Tu 
protein from two well-studied bacterial species, E. coli and Bacillus 


Gap 


subtilis. Introduction of a gap in the B. subtilis sequence allows a bet¬ 
ter alignment of amino acid residues on either side of the gap. Iden¬ 
tical amino acid residues are shaded. 
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Of course, if a sufficient number of gaps are introduced, 
almost any two sequences could be brought into some 
sort of alignment. To avoid uninformative alignments, 
the programs include penalties for each gap introduced, 
thus lowering the overall alignment score. With elec¬ 
tronic trial and error, the program selects the alignment 
with the optimal score that maximizes identical amino 
acid residues while minimizing the introduction of gaps. 

Identical amino acids are often inadequate to iden¬ 
tify related proteins or, more importantly, to determine 
how closely related the proteins are on an evolutionary 
time scale. A more useful analysis includes a consider¬ 
ation of the chemical properties of substituted amino 
acids. When amino acid substitutions are found within 
a protein family, many of the differences may be con¬ 
servative—that is, an amino acid residue is replaced by 
a residue having similar chemical properties. For ex¬ 
ample, a Glu residue may substitute in one family mem¬ 
ber for the Asp residue found in another; both amino 
acids are negatively charged. Such a conservative sub¬ 
stitution should logically garner a higher score in a se¬ 
quence alignment than does a nonconservative substi¬ 
tution, such as the replacement of the Asp residue with 
a hydrophobic Phe residue. 


To determine what scores to assign to the many dif¬ 
ferent amino acid substitutions, Steven Henikoff and 
Jorja Henikoff examined the aligned sequences from a 
variety of different proteins. They did not analyze en¬ 
tire protein sequences, focusing instead on thousands 
of short conserved blocks where the fraction of identi¬ 
cal amino acids was high and the alignments were thus 
reliable. Looking at the aligned sequence blocks, the 
Henikoffs analyzed the nonidentical amino acid residues 
within the blocks. Higher scores were given to non¬ 
identical residues that occurred frequently than to those 
that appeared rarely. Even the identical residues were 
given scores based on how often they were replaced, 
such that amino acids with unique chemical properties 
(such as Cys and Trp) received higher scores than those 
more conservatively replaced (such as Asp and Glu). 
The result of this scoring system is a Blosum (btocks 
substitution matrix) table. The table in Figure 3-31 was 
generated from sequences that were identical in at least 
62% of their amino acid residues, and it is thus referred 
to as Blosum62. Similar tables have been generated for 
blocks of homologous sequences that are 50% or 80% 
identical. When higher levels of identity are required, 
the most conservative amino acid substitutions can be 
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FIGURE 3-31 The Blosum62 table. This blocks substitution matrix 
was created by comparing thousands of short blocks of aligned se¬ 
quences that were identical in at least 62% of their amino acid 
residues. The nonidentical residues were assigned scores based on 
how frequently they were replaced by each of the other amino acids. 
Each substitution contributes to the score given to a particular align¬ 
ment. Positive numbers (shaded yellow) add to the score for a partic¬ 
ular alignment; negative numbers subtract from the score. Identical 


residues in sequences being compared (the shaded diagonal from top 
left to bottom right in the matrix) receive scores based on how often 
they are replaced, such that amino acids with unique chemical prop¬ 
erties (e.g., Cys and Trp) receive higher scores (9 and 11, respectively) 
than those more easily replaced in conservative substitutions (e.g., Asp 
(6) and Glu (5)). Many computer programs use Blosum62 to assign 
scores to new sequence alignments. 
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FIGURE 3-32 A signature sequence in the EF-la/EF-Tu protein 
family. The signature sequence (boxed) is a 12-amino-acid insertion 
near the amino terminus of the sequence. Residues that align in all 
species are shaded yellow. Both archaebacteria and eukaryotes have 


the signature, although the sequences of the insertions are quite dis¬ 
tinct for the two groups. The variation in the signature sequence re¬ 
flects the significant evolutionary divergence that has occurred at this 
site since it first appeared in a common ancestor of both groups. 


overrepresented, which limits the usefulness of the ma¬ 
trix in identifying homologs that are somewhat distantly 
related. Tests have shown that the Blosum62 table pro¬ 
vides the most reliable alignments over a wide range of 
protein families, and it is the default table in many se¬ 
quence alignment programs. 

For most efforts to find homologies and explore evo¬ 
lutionary relationships, protein sequences (derived ei¬ 
ther directly from protein sequencing or from the se¬ 
quencing of the DNA encoding the protein) are superior 
to nongenic nucleic acid sequences (those that do not 
encode a protein or functional RNA). For a nucleic acid, 
with its four different types of residues, random align¬ 
ment of nonhomologous sequences will generally yield 
matches for at least 25% of the positions. Introduction 
of a few gaps can often increase the fraction of matched 
residues to 40% or more, and the probability of chance 
alignment of unrelated sequences becomes quite high. 
The 20 different amino acid residues in proteins greatly 
lower the probability of uninformative chance align¬ 
ments of this type. 

The programs used to generate a sequence align¬ 
ment are complemented by methods that test the reli¬ 
ability of the alignments. A common computerized test 
is to shuffle the amino acid sequence of one of the pro¬ 
teins being compared to produce a random sequence, 
then instruct the program to align the shuffled sequence 
with the other, unshuffled one. Scores are assigned to 
the new alignment, and the shuffling and alignment 
process is repeated many times. The original alignment, 
before shuffling, should have a score significantly higher 
than any of those within the distribution of scores gen¬ 
erated by the random alignments; this increases the con¬ 
fidence that the sequence alignment has identified a pair 
of homologs. Note that the absence of a significant align¬ 
ment score does not necessarily mean that no evolu¬ 
tionary relationship exists between two proteins. As we 
shall see in Chapter 4, three-dimensional structural sim¬ 
ilarities sometimes reveal evolutionary relationships 
where sequence homology has been wiped away by time. 

Using a protein family to explore evolution requires 
the identification of family members with similar mo¬ 
lecular functions in the widest possible range of organ¬ 


isms. Information from the family can then be used to 
trace the evolution of those organisms. By analyzing the 
sequence divergence in selected protein families, in¬ 
vestigators can segregate organisms into classes based 
on their evolutionary relationships. This information 
must be reconciled with more classical examinations of 
the physiology and biochemistry of the organisms. 

Certain segments of a protein sequence may be 
found in the organisms of one taxonomic group but not 
in other groups; these segments can be used as signa¬ 
ture sequences for the group in which they are found. 
An example of a signature sequence is an insertion of 
12 amino acids near the amino terminus of the EF- 
la/EF-Tu proteins in all archaebacteria and eukaryotes 
but not in other types of bacteria (Fig. 3-32). The sig¬ 
nature is one of many biochemical clues that can help 
establish the evolutionary relatedness of eukaryotes and 
archaebacteria. For example, the major taxa of bacteria 
can be distinguished by signature sequences in several 
different proteins. The (3 and y proteobacteria have sig¬ 
nature sequences in the Hsp70 and DNA gyrase protein 
families (families of proteins involved in protein folding 
and DNA replication, respectively) that are not present 
in any other bacteria, including the other proteobacte¬ 
ria. The other types of proteobacteria (a, 5, e), along 
with the /3 and y proteobacteria, have a separate Hsp70 
signature sequence and a signature in alanyl-tRNA syn¬ 
thetase (an enzyme of protein synthesis) that are not 
present in other bacteria. The appearance of unique sig¬ 
natures in the j3 and y proteobacteria suggests the a, 8, 
and b proteobacteria arose before their f3 and y cousins. 

By considering the entire sequence of a protein, re¬ 
searchers can now construct more elaborate evolution¬ 
ary trees with many species in each taxonomic group. 
Figure 3-33 presents one such tree for bacteria, based 
on sequence divergence in the protein GroEL (a pro¬ 
tein present in all bacteria that assists in the proper fold¬ 
ing of proteins). The tree can be refined by basing it on 
the sequences of multiple proteins and by supplement¬ 
ing the sequence information with data on the unique 
biochemical and physiological properties of each 
species. There are many methods for generating trees, 
each with its own advantages and shortcomings, and 
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many ways to represent the resulting evolutionary rela¬ 
tionships. In Figure 3-33, the free end points of lines 
are called “external nodes”; each represents an extant 
species, and each is so labeled. The points where two 
lines come together, the “internal nodes,” represent ex¬ 
tinct ancestor species. In most representations (includ¬ 
ing Fig. 3-33), the lengths of the lines connecting the 
nodes are proportional to the number of amino acid sub¬ 
stitutions separating one species from another. If we 
trace two extant species to a common internal node 
(representing the common ancestor of the two species), 
the length of the branch connecting each external node 
to the internal node represents the number of amino 
acid substitutions separating one extant species from 
this ancestor. The sum of the lengths of all the line seg¬ 
ments that connect an extant species to another extant 
species through a common ancestor reflects the num¬ 
ber of substitutions separating the two extant species. 
To determine how much time was needed for the vari¬ 
ous species to diverge, the tree must be calibrated by 
comparing it with information from the fossil record and 
other sources. 

As more sequence information is made available in 
databases, we can generate evolutionary trees based on 
a variety of different proteins. Some proteins evolve 
faster than others, or change faster within one group of 
species than another. A large protein, with many vari¬ 


able amino acid residues, may exhibit a few differences 
between two closely related species. Another, smaller 
protein may be identical in the same two species. For 
many reasons, some details of an evolutionary tree 
based on the sequences of one protein may differ from 
those of a tree based on the sequences of another pro¬ 
tein. Increasingly sophisticated analyses using the se¬ 
quences of many different proteins can provide an ex¬ 
quisitely detailed and accurate picture of evolutionary 
relationships. The story is a work in progress, and the 
questions being asked and answered are fundamental to 
how humans view themselves and the world around 
them. The field of molecular evolution promises to be 
among the most vibrant of the scientific frontiers in the 
twenty-first century. 


SUMMARY 3.5 Protein Sequences and Evolution 


■ Protein sequences are a rich source of 
information about protein structure and 
function, as well as the evolution of life on this 
planet. Sophisticated methods are being 
developed to trace evolution by analyzing the 
resultant slow changes in the amino acid 
sequences of homologous proteins. 



FIGURE 3-33 Evolutionary tree derived from amino acid sequence comparisons. A bacterial 
evolutionary tree, based on the sequence divergence observed in the GroEL family of proteins. 
Also included in this tree (lower right) are the chloroplasts (chi.) of some nonbacterial species. 
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Problems 


1. Absolute Configuration of Citrulline The citrulline 
isolated from watermelons has the structure shown below. 

Is it a d- or L-amino acid? Explain. 

CH 2 (CH 2 ) 2 NH—C —nh 2 
H—C—NH 3 O 

coo- 

2. Relationship between the Titration Curve and the 
Acid-Base Properties of Glycine A 100 mL solution of 
0.1 m glycine at pH 1.72 was titrated with 2 m NaOH solution. 
The pH was monitored and the results were plotted on a 
graph, as shown at right. The key points in the titration are 
designated I to V. For each of the statements (a) to (o), iden¬ 
tify the appropriate key point in the titration and justify your 
choice. 

(a) Glycine is present predominantly as the species 
+ H S N—CH 2 —COOH. 

(b) The average net charge of glycine is + 

(c) Half of the amino groups are ionized. 

(d) The pH is equal to the p K a of the carboxyl group. 

(e) The pH is equal to the pA a of the protonated amino 
group. 

(f) Glycine has its maximum buffering capacity. 

(g) The average net charge of glycine is zero. 

(h) The carboxyl group has been completely titrated 
(first equivalence point). 

(i) Glycine is completely titrated (second equivalence 
point). 

(j) The predominant species is + H 3 N—CH 2 —COO - . 

(k) The average net charge of glycine is — 1. 

(l) Glycine is present predominantly as a 50:50 mixture 
of + H 3 N—CH 2 —COOH and + H 3 N—CH 2 —COO - . 

(m) This is the isoelectric point. 

(n) This is the end of the titration. 

(o) These are the worst pH regions for buffering power. 



3. How Much Alanine Is Present as the Completely 
Uncharged Species? At a pH equal to the isoelectric point 
of alanine, the net charge on alanine is zero. Two structures 
can be drawn that have a net charge of zero, but the pre¬ 
dominant form of alanine at its pi is zwitterionic. 


CH; 


H 3 N—C—C 

H 


v° 


O 


Zwitterionic 


CH 3 o 

! # 

h 2 n-c-c x 

H 0H 

Uncharged 


(a) Why is alanine predominantly zwitterionic rather 
than completely uncharged at its pi? 

(b) What fraction of alanine is in the completely un¬ 
charged form at its pi? Justify your assumptions. 
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4. Ionization State of Amino Acids Each ionizable group 
of an amino acid can exist in one of two states, charged or 
neutral. The electric charge on the functional group is de¬ 
termined by the relationship between its p K a and the pH of 
the solution. This relationship is described by the Henderson- 
Hasselbalch equation. 

(a) Histidine has three ionizable functional groups. 
Write the equilibrium equations for its three ionizations and 
assign the proper p K a for each ionization. Draw the structure 
of histidine in each ionization state. What is the net charge 
on the histidine molecule in each ionization state? 

(b) Draw the structures of the predominant ionization 
state of histidine at pH 1, 4, 8 , and 12. Note that the ioniza¬ 
tion state can be approximated by treating each ionizable 
group independently. 

(c) What is the net charge of histidine at pH 1, 4, 8 , and 
12? For each pH, will histidine migrate toward the anode (+) 
or cathode (—) when placed in an electric field? 

5. Separation of Amino Acids by Ion-Exchange Chro¬ 
matography Mixtures of amino acids are analyzed by first 
separating the mixture into its components through ion- 
exchange chromatography. Amino acids placed on a cation- 
exchange resin containing sulfonate groups (see Fig. 3-18a) 
flow down the column at different rates because of two fac¬ 
tors that influence their movement: ( 1 ) ionic attraction be¬ 
tween the —SO 3 residues on the column and positively 
charged functional groups on the amino acids, and ( 2 ) hy¬ 
drophobic interactions between amino acid side chains and 
the strongly hydrophobic backbone of the polystyrene resin. 
For each pair of amino acids listed, determine which will be 
eluted first from an ion-exchange column using a pH 7.0 
buffer. 

(a) Asp and Lys 

(b) Arg and Met 

(c) Glu and Val 

(d) Gly and Leu 

(e) Ser and Ala 

6. Naming the Stereoisomers of Isoleucine The struc¬ 
ture of the amino acid isoleucine is 


COCT 

+ I 

3 N-C-H 

I 

H—C—CH, 

I 

ch 2 

I 

CH : , 


(a) How many chiral centers does it have? 

(b) How many optical isomers? 

(c) Draw perspective formulas for all the optical isomers 
of isoleucine. 

7. Comparing the p K a Values of Alanine and Polyala¬ 
nine The titration curve of alanine shows the ionization of 
two functional groups with p K a values of 2.34 and 9.69, corre¬ 
sponding to the ionization of the carboxyl and the protonated 
amino groups, respectively. The titration of di-, tri-, and larger 
oligopeptides of alanine also shows the ionization of only two 
functional groups, although the experimental p K a values are 
different. The trend in p K a values is summarized in the table. 


Amino acid or peptide 

pKi 

pK 2 

Ala 

2.34 

9.69 

Ala-Ala 

3.12 

8.30 

Ala-Ala-Ala 

3.39 

8.03 

Ala-(Ala)„-Ala, n> 4 

3.42 

7.94 


(a) Draw the structure of Ala-Ala-Ala. Identify the func¬ 
tional groups associated with pand p K 2 . 

(b) Why does the value of p K 1 increase with each 
addition of an Ala residue to the Ala oligopeptide? 

(c) Why does the value of pff 2 decrease with each ad¬ 
dition of an Ala residue to the Ala oligopeptide? 

8. The Size of Proteins What is the approximate molec¬ 
ular weight of a protein with 682 amino acid residues in a sin¬ 
gle polypeptide chain? 

9. The Number of Tryptophan Residues in Bovine 
Serum Albumin A quantitative amino acid analysis reveals 
that bovine serum albumin (BSA) contains 0.58% tryptophan 
(M r 204) by weight. 

(a) Calculate the minimum molecular weight of BSA 
(i.e., assuming there is only one tryptophan residue per pro¬ 
tein molecule). 

(b) Gel filtration of BSA gives a molecular weight esti¬ 
mate of 70,000. How many tryptophan residues are present 
in a molecule of serum albumin? 

10. Net Electric Charge of Peptides A peptide has the 
sequence 

Glu-His-Trp-Ser-Gly-Leu-Arg-Pro-Gly 

(a) What is the net charge of the molecule at pH 3, 8 , 
and 11? (Use p?C a values for side chains and terminal amino 
and carboxyl groups as given in Table 3-1.) 

(b) Estimate the pi for this peptide. 

11. Isoelectric Point of Pepsin Pepsin is the name given 
to several digestive enzymes secreted (as larger precursor 
proteins) by glands that line the stomach. These glands also 
secrete hydrochloric acid, which dissolves the particulate 
matter in food, allowing pepsin to enzymatically cleave indi¬ 
vidual protein molecules. The resulting mixture of food, HC1, 
and digestive enzymes is known as chyme and has a pH near 
1.5. What pi would you predict for the pepsin proteins? What 
functional groups must be present to confer this pi on pepsin? 
Which amino acids in the proteins would contribute such 
groups? 

12. The Isoelectric Point of Histones Histones are pro¬ 
teins found in eukaryotic cell nuclei, tightly bound to DNA, 
which has many phosphate groups. The pi of histones is very 
high, about 10.8. What amino acid residues must be present 
in relatively large numbers in histones? In what way do these 
residues contribute to the strong binding of histones to DNA? 

13. Solubility of Polypeptides One method for separat¬ 
ing polypeptides makes use of their differential solubilities. 
The solubility of large polypeptides in water depends upon 
the relative polarity of their R groups, particularly on the num¬ 
ber of ionized groups: the more ionized groups there are, the 
more soluble the polypeptide. Which of each pair of the 
polypeptides that follow is more soluble at the indicated pH? 
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(a) (Gly) 20 or (Glu) 20 at pH 7.0 

(b) (Lys-Ala) 3 or (Phe-Met) 3 at pH 7.0 

(c) (Ala-Ser-Gly) 5 or (Asn-Ser-His)s at pH 6.0 

(d) (Ala-Asp-Gly) 5 or (Asn-Ser-His) 5 at pH 3.0 


14. Purification of an Enzyme A biochemist discovers 
and purifies a new enzyme, generating the purification table 
below. 


Procedure 

Total 

protein 

(mg) 

Activity 

(units) 

1. Crude extract 

20,000 

4,000,000 

2. Precipitation (salt) 

5,000 

3,000,000 

3. Precipitation (pH) 

4,000 

1,000,000 

4. Ion-exchange chromatography 

200 

800,000 

5. Affinity chromatography 

50 

750,000 

6. Size-exclusion chromatography 

45 

675,000 


Phe, Pro, and Val. Orn is ornithine, an amino acid not present 
in proteins but present in some peptides. It has the structure 

H 

h 3 n—ch 2 —ch 2 —ch 2 — c—coo 
+ nh 3 

(b) The molecular weight of the peptide was estimated 
as about 1,200. 

(c) The peptide failed to undergo hydrolysis when 
treated with the enzyme carboxypeptidase. This enzyme cat¬ 
alyzes the hydrolysis of the carboxyl-terminal residue of a 
polypeptide unless the residue is Pro or, for some reason, 
does not contain a free carboxyl group. 

(d) Treatment of the intact peptide with l-fluoro-2,4- 
dinitrobenzene, followed by complete hydrolysis and chro¬ 
matography, yielded only free amino acids and the following 
derivative: 


(a) From the information given in the table, calculate 
the specific activity of the enzyme solution after each purifi¬ 
cation procedure. 

(b) Which of the purification procedures used for this 
enzyme is most effective (i.e., gives the greatest relative in¬ 
crease in purity)? 

(c) Which of the purification procedures is least effective? 

(d) Is there any indication based on the results shown 
in the table that the enzyme after step 6 is now pure? What 
else could be done to estimate the purity of the enzyme prepa¬ 
ration? 

15. Sequence Determination of the Brain Peptide 
Leucine Enkephalin A group of peptides that influence 
nerve transmission in certain parts of the brain has been iso¬ 
lated from normal brain tissue. These peptides are known as 
opioids, because they bind to specific receptors that also bind 
opiate drugs, such as morphine and naloxone. Opioids thus 
mimic some of the properties of opiates. Some researchers 
consider these peptides to be the brain’s own pain killers. Us¬ 
ing the information below, determine the amino acid sequence 
of the opioid leucine enkephalin. Explain how your structure 
is consistent with each piece of information. 

(a) Complete hydrolysis by 6 m HC1 at 110 °C followed 
by amino acid analysis indicated the presence of Gly, Leu, 
Phe, and Tyr, in a 2:1:1:1 molar ratio. 

(b) Treatment of the peptide with l-lluoro-2,4-dini- 
trobenzene followed by complete hydrolysis and chromatog¬ 
raphy indicated the presence of the 2,4-dinitrophenyl deriv¬ 
ative of tyrosine. No free tyrosine could be found. 

(c) Complete digestion of the peptide with pepsin fol¬ 
lowed by chromatography yielded a dipeptide containing Phe 
and Leu, plus a tripeptide containing Tyr and Gly in a 1:2 ratio. 

16. Structure of a Peptide Antibiotic from Bacillus bre¬ 
vis Extracts from the bacterium Bacillus brevis contain a 
peptide with antibiotic properties. This peptide forms com¬ 
plexes with metal ions and apparently disrupts ion transport 
across the cell membranes of other bacterial species, killing 
them. The structure of the peptide has been determined from 
the following observations. 

(a) Complete acid hydrolysis of the peptide followed by 
amino acid analysis yielded equimolar amounts of Leu, Orn, 



(Hint: Note that the 2,4-dinitrophenyl derivative involves the 
amino group of a side chain rather than the a-amino group.) 

(e) Partial hydrolysis of the peptide followed by chro¬ 
matographic separation and sequence analysis yielded the fol¬ 
lowing di- and tripeptides (the amino-terminal amino acid is 
always at the left): 

Leu-Phe Phe-Pro Orn-Leu Val-Orn 


Val-Orn-Leu Phe-Pro-Val Pro-Val-Orn 


Given the above information, deduce the amino acid sequence 
of the peptide antibiotic. Show your reasoning. When you 
have arrived at a structure, demonstrate that it is consistent 
with each experimental observation. 

17. Efficiency in Peptide Sequencing A peptide with the 
primary structure Lys-Arg-Pro-Leu-Ile-Asp-Gly-Ala is se¬ 
quenced by the Edman procedure. If each Edman cycle is 
96% efficient, what percentage of the amino acids liberated 
in the fourth cycle will be leucine? Do the calculation a sec¬ 
ond time, but assume a 99% efficiency for each cycle. 

18. Biochemistry Protocols: Your First Protein Purifi¬ 
cation As the newest and least experienced student in a 
biochemistry research lab, your first few weeks are spent 
washing glassware and labeling test tubes. You then graduate 
to making buffers and stock solutions for use in various lab¬ 
oratory procedures. Finally, you are given responsibility for 
purifying a protein. It is a citric acid cycle enzyme, citrate 
synthase, located in the mitochondrial matrix. Following a 
protocol for the purification, you proceed through the steps 
below. As you work, a more experienced student questions 
you about the rationale for each procedure. Supply the an¬ 
swers. (Hint: See Chapter 2 for information about osmolar- 
ity; see p. 6 for information on separation of organelles from 
cells.) 

(a) You pick up 20 kg of beef hearts from a nearby 
slaughterhouse. You transport the hearts on ice, and perform 
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each step of the purification on ice or in a walk-in cold room. 
You homogenize the beef heart tissue in a high-speed blender 
in a medium containing 0.2 m sucrose, buffered to a pH of 7.2. 
Why do you use beef heart tissue, and in such large quan¬ 
tity? What is the purpose of keeping the tissue cold and 
suspending it in 0.2 m sucrose, at pH 7.2? What happens 
to the tissue when it is homogenized? 

(b) You subject the resulting heart homogenate, which 
is dense and opaque, to a series of differential centrifugation 
steps. What does this accomplish? 

(c) You proceed with the purification using the super¬ 
natant fraction that contains mostly intact mitochondria. Next 
you osmotically lyse the mitochondria. The lysate, which is 
less dense than the homogenate, but still opaque, consists 
primarily of mitochondrial membranes and internal mito¬ 
chondrial contents. To this lysate you add ammonium sulfate, 
a highly soluble salt, to a specific concentration. You cen¬ 
trifuge the solution, decant the supernatant, and discard the 
pellet. To the supernatant, which is clearer than the lysate, 
you add more ammonium sulfate. Once again, you centrifuge 
the sample, but this time you save the pellet because it con¬ 
tains the protein of interest. What is the rationale for the 
two-step addition of the salt? 

(d) You solubilize the ammonium sulfate pellet contain¬ 
ing the mitochondrial proteins and dialyze it overnight against 
large volumes of buffered (pH 7.2) solution. Why isn’t am¬ 
monium sulfate included in the dialysis buffer? Why do 
you use the buffer solution instead of water? 


(e) You run the dialyzed solution over a size-exclusion 
chromatographic column. Following the protocol, you collect 
the first protein fraction that exits the column, and discard 
the rest of the fractions that elute from the column later. You 
detect the protein by measuring UV absorbance (at 280 nm) 
in the fractions. What does the instruction to collect the 
first fraction tell you about the protein? Why is UV ab¬ 
sorbance at 280 nm a good way to monitor for the pres¬ 
ence of protein in the eluted fractions? 

(f) You place the fraction collected in (e) on a cation- 
exchange chromatographic column. After discarding the ini¬ 
tial solution that exits the column (the flowthrough), you add 
a washing solution of higher pH to the column and collect the 
protein fraction that immediately elutes. Explain ivhat you 
are doing. 

(g) You run a small sample of your fraction, now very 
reduced in volume and quite clear (though tinged pink), on 
an isoelectric focusing gel. When stained, the gel shows three 
sharp bands. According to the protocol, the protein of inter¬ 
est is the one with the pi of 5.6, but you decide to do one 
more assay of the protein’s purity. You cut out the pi 5.6 band 
and subject it to SDS polyacrylamide gel electrophoresis. The 
protein resolves as a single band. Why were you uncon¬ 
vinced of the purity of the “single” protein band on your 
isoelectric focusing gel? What did the results of the SDS 
gel tell you? Why is it important to do the SDS gel elec¬ 
trophoresis after the isoelectric focusing? 



